R "mi" package: mi(...) command throws error - r

I am attempting to do multiple imputation with the "mi" package (v1.0). Due to computing/processing time constraints, I split my code into two files. The first does all of the mi-style preprocessing, and the second actually runs the imputation.
The first file runs without error, but I am including it here for completeness (below is an edited, shorter version of the file):
require(mi)
# Load data for multiple imputation
data = as.data.frame(read.delim("for_mi.csv"))
...
# Declare data as missing data frame for MI functions
mdf = missing_data.frame(data)
mdf <- change(mdf, y = "x", what = "type", to = "nonnegative-continuous")
... (many type corrections later) ...
mdf <- change(mdf, y = "y", what = "type", to = "positive-continuous")
# Save pre-processed missing-format data for analysis in r_mi_2.R.
saveRDS(mdf,"preprocessed.rds")
The second file is the one that throws the error:
require(mi)
# Load output from first file
mdf <- readRDS("preprocessed.rds")
# Note: at this point, mdf loads as a missing_data.frame.
# MI commands such as show(mdf) function as expected.
# Impute data
imputations <- mi(mdf, n.iter = 30, n.chains = 4, max.minutes = Inf, parallel = TRUE)
I get the following output:
Chain 1
Chain 1 Iteration 0
Chain 2
Chain 2 Iteration 0
Chain 1 Iteration 1
Chain 3
Chain 3 Iteration 0
Chain 2 Iteration 1
Chain 4
Chain 4 Iteration 0
Chain 3 Iteration 1
Chain 4 Iteration 1
Error in checkForRemoteErrors(val) :
4 nodes produced errors; first error: cannot open the connection
Calls: mi ... clusterApply -> staticClusterApply -> checkForRemoteErrors
Warning message:
In file(file, ifelse(append, "a", "w")) :
cannot open file '/var/tmp/Rtmp0TqkWn/mi1502972500/pars_1.csv': No such file or directory
Warning message:
In file(file, ifelse(append, "a", "w")) :
cannot open file '/var/tmp/Rtmp0TqkWn/mi1502972500/pars_2.csv': No such file or directory
Execution halted
Warning message:
In file(file, ifelse(append, "a", "w")) :
cannot open file '/var/tmp/Rtmp0TqkWn/mi1502972500/pars_3.csv': No such file or directory
Warning message:
In file(file, ifelse(append, "a", "w")) :
cannot open file '/var/tmp/Rtmp0TqkWn/mi1502972500/pars_4.csv': No such file or directory
Other background info:
I am running the code on a cluster, using 8 processors on a single node. I have also tried running it locally on my computer, with the same result.
I have tried varying the number of chains, lowering the number of iterations, and setting parallel = FALSE, all to no avail.
I have tried running the code with and without the options(mc.cores = 4) line that appears in the mi vignette (see here, page 4)
The mi vignette linked above states that many errors from running mi stem from running out of RAM. I'm not sure how to test this, but some info: the starting object mdf is 128MB, the per-user server cap on the cluster is 32GB, and the error throws even if I set parallel=FALSE or n.iter=1.
Any help would be greatly appreciated. Thank you!
Edit: The error output with parallel = FALSE is:
Chain 1
Chain 1 Iteration 0
Chain 1 Iteration 1
Error in file(file, ifelse(append, "a", "w")) :
cannot open the connection
Calls: mi -> mi -> .local -> .mi -> write.table -> file
In addition: Warning message:
In file(file, ifelse(append, "a", "w")) :
cannot open file '/var/tmp/Rtmp0TqkWn/mi1502972500/pars_1.csv': No such file or directory
Execution halted

Related

Re-run chunk automatically if error occurs in R [duplicate]

I have the following API call function:
df.load <- loaddf(r = as.character(readline(prompt = "Please enter r parameter: "))
,y= as.character(readline(prompt = "Please enter y parameter: "))
,z= as.character(readline(prompt = "Please enter z parameter: ")) )
,format="json"
# Convert to dataframe
data.df <- as.data.frame(df.load )
However, sometimes it fails to connect:
Warning in file(file, "rt") :
InternetOpenUrl failed: 'A connection with the server could not be established'
Error in file(file, "rt") : cannot open the connection
and a number of observations in the final dataframe data.df is equal to 0, but when I re-run the code 3-4 times, it works.
Therefore I want to push R to re-run this chunk in the .rmd file if the connection failed or a number of observations are equal to 0 in data.df.
Or if you have any other ideas on how to solve this issue I am open to your suggestions.
Thanks

Error in file(file, "rt") : invalid 'description' argument when running R script

I am trying to reproduce this protocol for DNA sequencing data analysis. It requires running this bash script that links to an R script. However, I am getting this error (see bottom) that I cant seem to solve.
#!/bin/bash
Project_dir=~/base
cd /${Project_dir}
SCRTP=~/scRepliseq-Pipeline
OUTNAME="bam/G1_F121_A1.adapter_filtered2"
genome_name="mm10"
bamfile=${OUTNAME}.${genome_name}.clean_srt_markdup.bam
rscript=${SCRTP}/util/Step3_R-Aneu-Fragment-bins.R
out_dir="Aneu_analysis"
Name=‘$bamfile’
Name=${name%.adapter_filtered2.${genome_name}.clean_srt_markdup.bam}
blacklist=~/blacklist/mm10-blacklist-v1_id.bed
genome_file=~/reference/UCSC_hg19_female.fa.fai
mkdir -p ${out_dir}
Rscript --vanilla $rscript ${bamfile} ${out_dir} ${name} ${blacklist} ${genome_file}
it links to this R script
args = commandArgs(TRUE)
bamfile=args[1]
out_dir=args[2]
name=args[3]
blacklist=args[4]
genome_file=args[5]
options(scipen=100)
##Extension of file name##
ext="_mapq10_blacklist_fragment.Rdata"
ext2="_mapq10_blacklist_bin.Rdata"
library(AneuFinder)
##loading black list and genome Info##
genome_tmp <- read.table(genome_file,sep="\t") #UCSC_mm9.woYwR.fa.fai
genome=data.frame(UCSC_seqlevel=genome_tmp$V1,UCSC_seqlength=genome_tmp$V2)
chromosomes=as.character(genome$UCSC_seqlevel)
##setup output directories##
out_dir_f=paste0(out_dir,"/fragment")
out_dir_b=paste0(out_dir,"/bins")
dir.create(out_dir,showWarnings = FALSE)
dir.create(out_dir_f,showWarnings = FALSE)
dir.create(out_dir_b,showWarnings = FALSE)
##save the fragment file (>10 MAPQ), filtering out the blacklist regions##
raw_reads=bam2GRanges(bamfile,remove.duplicate.reads = TRUE,min.mapq = 10,blacklist = blacklist)
save(raw_reads,file = paste0(out_dir_f,"/",name,ext))
##save the bin data file ##
bins_reads=binReads(raw_reads,
assembly=genome,
chromosomes=chromosomes,
binsizes=c(40000,80000,100000,200000,500000))
rpm=1000000/length(raw_reads)
bins_reads[["rpm"]]=rpm
save(bins_reads,file=paste(out_dir_b,"/",name,ext2,sep=""))
It shows this error:
Error in file(file, "rt") : invalid 'description' argument
Calls: read.table -> file
Execution halted

Recommenderlab running into memory issues

I am trying to compare some recommender algorithms against each other but am running into some memory issues. The dataset that i am using is https://drive.google.com/open?id=0By5yrncwiz_VZUpiak5Hc2l3dkE
Following is my code:
library(recommenderlab)
library(Matrix)
Amazon <- read.csv(Path to Reviews.csv, header = TRUE,
col.names = c("ID","ProductId","UserId","HelpfulnessNumerator","HelpfulnessDenominator","Score",
"Time","Summary","Text"),
colClasses = c("NULL","character","character","NULL","NULL","integer","NULL","NULL","NULL"))
Amazon <- Amazon[,c("UserId","ProductId","Score")]
Amazon <- Amazon[!duplicated(Amazon[1:2]),] ## To get unique values
scheme <- evaluationScheme(r, method = "split", train = .7,
k = 1, given = 1 ,goodRating = 4)
algorithms <- list(
"user-based CF" = list(name="UBCF", param=list(normalize = "Z-score",
method="Cosine",
nn=50, minRating=3)),
"item-based CF" = list(name="IBCF", param=list(normalize = "Z-score"
))
)
results <- evaluate(scheme, algorithms, n=c(1, 3, 5))
I get the following errors :
UBCF run fold/sample [model time/prediction time]
1 Timing stopped at: 1.88 0 1.87
Error in asMethod(object) :
Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
IBCF run fold/sample [model time/prediction time]
1 Timing stopped at: 4.93 0.02 4.95
Error in asMethod(object) :
Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
Warning message:
In .local(x, method, ...) :
Recommender 'user-based CF' has failed and has been removed from the results!
Recommender 'item-based CF' has failed and has been removed from the results!
I tried to use recommenderlabrats package which i thought would solve this problem but could not install it. https://github.com/sanealytics/recommenderlabrats
It gave me some errors which i am not bale to make sense of:
c:/rbuildtools/3.3/gcc-4.6.3/bin/../lib/gcc/i686-w64- mingw32/4.6.3/../../../../i686-w64-mingw32/bin/ld.exe: cannot find -llapack
collect2: ld returned 1 exit status
Then i came to this link for solving the recommenderlabrats problem but it did not work for me
Error while installing package from github in R. Error in dyn.load
Any help on how to get around the memory issue is appreciated
I am the author of recommenderlabrats. Try to install now, it should be fixed. Then use RSVD/ALS to solve. Your matrix is too big even when it's sparse for your computer.
Also, it might be a good idea to experiment with a smaller sample before spending on an AWS memory instance.

HANA RLANG temporary file cannot open the connection

I'm using SAP HANA together with R. I use the cSPADE implementation in R and therefore load the data into a temporary file.
However until know it worked fine so far but now I getting an error. It seems to be some problem with the temporary file.
Could not execute 'CALL "BETA_SYSTEM"."run"()' in 15.741 seconds .
SAP DBTech JDBC: [2048]: column store error: [2048] "BETA_SYSTEM"."run": line 21 col 2 (at pos 821): [2048] (range 3): column store error: [2048] "BETA_SYSTEM"."beta.procedures::selectSpadeData": line 33 col 4 (at pos 992): [2048] (range 3): column store error: search table error: [34082] Execution of R script failed.;Error in file(file, ifelse(append, "a", "w")) :
cannot open the connection
stack trace:
2: file(file, ifelse(append, "a", "w"))
1: write.table(spade_data, file = file, sep = " ", row.names = FALSE,
quote = FALSE, col.names = FALSE)
CREATE PROCEDURE BETA_SYSTEM.spadeR (
IN spade_data "SPADE_INPUT_T",
OUT result "SPADE_RESULT_T"
)
LANGUAGE RLANG
AS
BEGIN
library(Matrix)
library(arules)
library(arulesSequences)
file <- tempfile()
write.table(spade_data, file = file, sep = " ", row.names=FALSE, quote = FALSE, col.names=FALSE)
df_trans <- read_baskets(file, info=c("sequenceID", "eventID", "SIZE"))
as(df_trans, "data.frame")
err_res <- tryCatch({
s_res <- cspade(df_trans, parameter = list(support = 0), control = list(verbose = TRUE))
}, warning= function(war) {
}, error=function(err) {
}, finally={
}
)
if(length(err_res) == 0){
result<-data.frame(sequence="EMPTY",support=0)
} else {
result <- as(s_res, "data.frame")
}
END;
The error stack indicates that there is a problem with accessing the file. This happens in the R processes spawned by RSERV. So it's outside SAP HANA and I would propose to check whether the R process with the same working directory can actually perform the file operation.
Maybe the file system is full or a access permission has been changed!?

Long path/filename in windows makes write.table() error out in R

In R, I was using write.table() to write a file into a location embedded in directories with long names. But it errors out as below:
Error in file(file, ifelse(append, "a", "w")) :
cannot open the connection
In addition: Warning message:
In file(file, ifelse(append, "a", "w")) :
cannot open file 'data/production/Weekly_Prod_201407_Selling_Price_Snapshot_20140930_Median_Selling_Price_Map.csv': No such file or directory
Then when I shortened the filename to Weekly_Prod.csv, it worked! So it seems the long path and the long filename caused R to error out.
I tested it a few times and found that the limit is 260 characters for the total length of path+filename. That is, R errors out when it's 261 characters or more. Is there a way to get around of this? Please help. Thanks!
There is a limit on file path length on windows:
> write(1, paste0(paste(sample(letters, 150, TRUE), collapse = ''), '.txt'))
> write(1, paste0(paste(sample(letters, 250, TRUE), collapse = ''), '.txt'))
Error in file(file, ifelse(append, "a", "w")) :
cannot open the connection
In addition: Warning message:
In file(file, ifelse(append, "a", "w")) :
cannot open file 'qvxirpnlwkqfwlxhggkscxlwhhyblrwxfpikpsukrfqwhaqvsyhdpihnoknqmxgafvawxkuijqbmvgdjwwgeumfksmhtiqwvzwmjukmmmeesvcdpdbpimarxssnrngfxwjksqshjruralhtwdnfmdhzrcwcdrnwezdhwqyisbjikdhbbygtcoeechgwrewenewbrlexliiikdnwlclbzllaxcohacadxzztgmtnmppyxtxtbopxdokjnvx.txt': No such file or directory
According to this source it is 260 characters
http://msdn.microsoft.com/en-us/library/aa365247.aspx#maxpath
> nchar(getwd())
[1] 23
> write(1, paste0(paste(sample(letters, 231, TRUE), collapse = ''), '.txt'))
> write(1, paste0(paste(sample(letters, 232, TRUE), collapse = ''), '.txt'))
Error in file(file, ifelse(append, "a", "w")) :
cannot open the connection
In addition: Warning message:
In file(file, ifelse(append, "a", "w")) :
cannot open file 'topylmudgfnrkdilqbklylwtbwrgwbwmamxzhwwzlxxslqeuhpywahoxqxpkckvmkfjccbsqncctlovcnxctkyvgunnbqcwyiliwpfkjibanpmtupsxfboxnjaadovtdpxeloqjnbqgvkcilwljfswzlrlqixmwqpoemcemhdizwwwbgqruhepyrskiklkbylzjhrcchbusohkrwyzgablvngqrqiardubcbziex.txt': No such file or directory
> getwd()
[1] "C:/Users/john/Documents"
> nchar(file.path(getwd(), paste0(paste(sample(letters, 231, TRUE), collapse = ''), '.txt')))
[1] 259
One possible solution which may work for you is to create a virtual drive for your long directory path. It should give you a bit of leeway see https://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/subst.mspx?mfr=true
> system("subst x: C:/Users/john/Documents")
> write(1, paste0("x://", paste(sample(letters, 251, TRUE), collapse = ''), '.txt'))
when you are done with the virtual drive you can reverse using:
system("subst x: /D")
This could be taken care of by replacing the name of the said file with its Short File Name (SFN), also known as the 8.3 file name.
Type dir /x in the command prompt over the directory where the file is located, which would list all the SFN's of the files in the directory.
Then replace the file-name in your code with its corresponding 8.3 file-name.
This is not an error with R, but a limitation imposed by Windows. Since Windows 10, the 260 limit can be lifted (to 32,767 characters). According to this article:
In the Windows API... the maximum length for a path is MAX_PATH, which is defined as 260 characters.
Starting in Windows 10, version 1607, MAX_PATH limitations have been removed from common Win32 file and directory functions. However, you must opt-in to the new behavior.
The article gives instruction on how to opt-into longer path length limits. I just did it, restarted my computer, and now I don't get that error.

Resources