I have the following API call function:
df.load <- loaddf(r = as.character(readline(prompt = "Please enter r parameter: "))
,y= as.character(readline(prompt = "Please enter y parameter: "))
,z= as.character(readline(prompt = "Please enter z parameter: ")) )
,format="json"
# Convert to dataframe
data.df <- as.data.frame(df.load )
However, sometimes it fails to connect:
Warning in file(file, "rt") :
InternetOpenUrl failed: 'A connection with the server could not be established'
Error in file(file, "rt") : cannot open the connection
and a number of observations in the final dataframe data.df is equal to 0, but when I re-run the code 3-4 times, it works.
Therefore I want to push R to re-run this chunk in the .rmd file if the connection failed or a number of observations are equal to 0 in data.df.
Or if you have any other ideas on how to solve this issue I am open to your suggestions.
Thanks
Related
I have many URLs which I import their text in R.
I use this code:
setNames(lapply(1:1000, function(x) gettxt(get(paste0("url", x)))), paste0("url", 1:1000, "_txt")) %>%
list2env(envir = globalenv())
However, some URLs can not import and show this error:
Error in file(con, "r") : cannot open the connection In addition:
Warning message: In file(con, "r") : InternetOpenUrl failed: 'A
connection with the server could not be established'
So, my code doesn't run and doesn't import any text from any URL.
How can I recognize wrong URLs and skip them in other to import correct URLs?
one possible aproach besides trycatch mentioned by #tester can be the purrr-package:
library(purrr)
# declare function
my_gettxt <- function(x) {
gettxt(get(paste0("url", x)))
}
# make function error prone by defining the otherwise value (could be empty df with column defintion, etc.) used as output if function fails
my_gettxt <- purrr::possibly(my_gettxt , otherwise = NA)
# use map from purrr instead of apply function
my_data <- purrr::map(1:1000, ~my_gettxt(.x))
I am trying to reproduce this protocol for DNA sequencing data analysis. It requires running this bash script that links to an R script. However, I am getting this error (see bottom) that I cant seem to solve.
#!/bin/bash
Project_dir=~/base
cd /${Project_dir}
SCRTP=~/scRepliseq-Pipeline
OUTNAME="bam/G1_F121_A1.adapter_filtered2"
genome_name="mm10"
bamfile=${OUTNAME}.${genome_name}.clean_srt_markdup.bam
rscript=${SCRTP}/util/Step3_R-Aneu-Fragment-bins.R
out_dir="Aneu_analysis"
Name=‘$bamfile’
Name=${name%.adapter_filtered2.${genome_name}.clean_srt_markdup.bam}
blacklist=~/blacklist/mm10-blacklist-v1_id.bed
genome_file=~/reference/UCSC_hg19_female.fa.fai
mkdir -p ${out_dir}
Rscript --vanilla $rscript ${bamfile} ${out_dir} ${name} ${blacklist} ${genome_file}
it links to this R script
args = commandArgs(TRUE)
bamfile=args[1]
out_dir=args[2]
name=args[3]
blacklist=args[4]
genome_file=args[5]
options(scipen=100)
##Extension of file name##
ext="_mapq10_blacklist_fragment.Rdata"
ext2="_mapq10_blacklist_bin.Rdata"
library(AneuFinder)
##loading black list and genome Info##
genome_tmp <- read.table(genome_file,sep="\t") #UCSC_mm9.woYwR.fa.fai
genome=data.frame(UCSC_seqlevel=genome_tmp$V1,UCSC_seqlength=genome_tmp$V2)
chromosomes=as.character(genome$UCSC_seqlevel)
##setup output directories##
out_dir_f=paste0(out_dir,"/fragment")
out_dir_b=paste0(out_dir,"/bins")
dir.create(out_dir,showWarnings = FALSE)
dir.create(out_dir_f,showWarnings = FALSE)
dir.create(out_dir_b,showWarnings = FALSE)
##save the fragment file (>10 MAPQ), filtering out the blacklist regions##
raw_reads=bam2GRanges(bamfile,remove.duplicate.reads = TRUE,min.mapq = 10,blacklist = blacklist)
save(raw_reads,file = paste0(out_dir_f,"/",name,ext))
##save the bin data file ##
bins_reads=binReads(raw_reads,
assembly=genome,
chromosomes=chromosomes,
binsizes=c(40000,80000,100000,200000,500000))
rpm=1000000/length(raw_reads)
bins_reads[["rpm"]]=rpm
save(bins_reads,file=paste(out_dir_b,"/",name,ext2,sep=""))
It shows this error:
Error in file(file, "rt") : invalid 'description' argument
Calls: read.table -> file
Execution halted
I am attempting to do multiple imputation with the "mi" package (v1.0). Due to computing/processing time constraints, I split my code into two files. The first does all of the mi-style preprocessing, and the second actually runs the imputation.
The first file runs without error, but I am including it here for completeness (below is an edited, shorter version of the file):
require(mi)
# Load data for multiple imputation
data = as.data.frame(read.delim("for_mi.csv"))
...
# Declare data as missing data frame for MI functions
mdf = missing_data.frame(data)
mdf <- change(mdf, y = "x", what = "type", to = "nonnegative-continuous")
... (many type corrections later) ...
mdf <- change(mdf, y = "y", what = "type", to = "positive-continuous")
# Save pre-processed missing-format data for analysis in r_mi_2.R.
saveRDS(mdf,"preprocessed.rds")
The second file is the one that throws the error:
require(mi)
# Load output from first file
mdf <- readRDS("preprocessed.rds")
# Note: at this point, mdf loads as a missing_data.frame.
# MI commands such as show(mdf) function as expected.
# Impute data
imputations <- mi(mdf, n.iter = 30, n.chains = 4, max.minutes = Inf, parallel = TRUE)
I get the following output:
Chain 1
Chain 1 Iteration 0
Chain 2
Chain 2 Iteration 0
Chain 1 Iteration 1
Chain 3
Chain 3 Iteration 0
Chain 2 Iteration 1
Chain 4
Chain 4 Iteration 0
Chain 3 Iteration 1
Chain 4 Iteration 1
Error in checkForRemoteErrors(val) :
4 nodes produced errors; first error: cannot open the connection
Calls: mi ... clusterApply -> staticClusterApply -> checkForRemoteErrors
Warning message:
In file(file, ifelse(append, "a", "w")) :
cannot open file '/var/tmp/Rtmp0TqkWn/mi1502972500/pars_1.csv': No such file or directory
Warning message:
In file(file, ifelse(append, "a", "w")) :
cannot open file '/var/tmp/Rtmp0TqkWn/mi1502972500/pars_2.csv': No such file or directory
Execution halted
Warning message:
In file(file, ifelse(append, "a", "w")) :
cannot open file '/var/tmp/Rtmp0TqkWn/mi1502972500/pars_3.csv': No such file or directory
Warning message:
In file(file, ifelse(append, "a", "w")) :
cannot open file '/var/tmp/Rtmp0TqkWn/mi1502972500/pars_4.csv': No such file or directory
Other background info:
I am running the code on a cluster, using 8 processors on a single node. I have also tried running it locally on my computer, with the same result.
I have tried varying the number of chains, lowering the number of iterations, and setting parallel = FALSE, all to no avail.
I have tried running the code with and without the options(mc.cores = 4) line that appears in the mi vignette (see here, page 4)
The mi vignette linked above states that many errors from running mi stem from running out of RAM. I'm not sure how to test this, but some info: the starting object mdf is 128MB, the per-user server cap on the cluster is 32GB, and the error throws even if I set parallel=FALSE or n.iter=1.
Any help would be greatly appreciated. Thank you!
Edit: The error output with parallel = FALSE is:
Chain 1
Chain 1 Iteration 0
Chain 1 Iteration 1
Error in file(file, ifelse(append, "a", "w")) :
cannot open the connection
Calls: mi -> mi -> .local -> .mi -> write.table -> file
In addition: Warning message:
In file(file, ifelse(append, "a", "w")) :
cannot open file '/var/tmp/Rtmp0TqkWn/mi1502972500/pars_1.csv': No such file or directory
Execution halted
I would like to create a loop to load this files through read.esetof bioconductor.
I tried that:
for(k in 1:29){
expr <- paste0("/home/proj/MT_Nellore/R/eBrowser/Adjusted/LRRadjustedextremes0.5kgchr",k,".txt")
pdat <- paste0("/home/proj/MT_Nellore/R/eBrowser/Adjusted/Samplesbinary0.5.txt")
ffdat <- paste0("/home/proj/MT_Nellore/R/LRR/Chr_adjusted/probeslabeladjustedchr",k,".txt")
eset <- read.eset(exprs.file="expr", pdat.file="/home/proj/MT_Nellore/R/eBrowser/Adjusted/Samplesbinary0.5.txt", fdat.file="ffdat")
}
However I get this error:
## Error in file(file, "r") : cannot open the connection
## In addition: Warning message:
## In file(file, "r") : cannot open file 'ffdat': No such file or directory
Any suggestions?
Ah - just spotted the error - you must remove quotes from around the "ffdat" on the final line, and same for the "expr"
(Windows 7 / R version 3.0.1)
Below the commands and the resulting error:
> library(tm)
> pdf <- readPDF(PdftotextOptions = "-layout")
> dat <- pdf(elem = list(uri = "17214.pdf"), language="de", id="id1")
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file 'C:\Users\Raffael\AppData\Local\Temp
\RtmpS8Uql1\pdfinfo167c2bc159f8': No such file or directory
How do I solve this issue?
EDIT I
(As suggested by Ben and described here)
I downloaded Xpdf copied the 32bit version to
C:\Program Files (x86)\xpdf32
and the 64bit version to
C:\Program Files\xpdf64
The environment variables pdfinfo and pdftotext are referring to the respective executables either 32bit (tested with R 32bit) or to 64bit (tested with R 64bit)
EDIT II
One very confusing observation is that starting from a fresh session (tm not loaded) the last command alone will produce the error:
> dat <- pdf(elem = list(uri = "17214.pdf"), language="de", id="id1")
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file 'C:\Users\Raffael\AppData\Local\Temp\RtmpKi5GnL
\pdfinfode8283c422f': No such file or directory
I don't understand this at all because the function variable is not defined by tm.readPDF yet. Below you'll find the function pdf refers to "naturally" and to what is returned by tm.readPDF:
> pdf
function (elem, language, id)
{
meta <- tm:::pdfinfo(elem$uri)
content <- system2("pdftotext", c(PdftotextOptions, shQuote(elem$uri),
"-"), stdout = TRUE)
PlainTextDocument(content, meta$Author, meta$CreationDate,
meta$Subject, meta$Title, id, meta$Creator, language)
}
<environment: 0x0674bd8c>
> library(tm)
> pdf <- readPDF(PdftotextOptions = "-layout")
> pdf
function (elem, language, id)
{
meta <- tm:::pdfinfo(elem$uri)
content <- system2("pdftotext", c(PdftotextOptions, shQuote(elem$uri),
"-"), stdout = TRUE)
PlainTextDocument(content, meta$Author, meta$CreationDate,
meta$Subject, meta$Title, id, meta$Creator, language)
}
<environment: 0x0c3d7364>
Apparently there is no difference - then why use readPDF at all?
EDIT III
The pdf file is located here: C:\Users\Raffael\Documents
> getwd()
[1] "C:/Users/Raffael/Documents"
EDIT IV
First instruction in pdf() is a call to tm:::pdfinfo() - and there the error is caused within the first few lines:
> outfile <- tempfile("pdfinfo")
> on.exit(unlink(outfile))
> status <- system2("pdfinfo", shQuote(normalizePath("C:/Users/Raffael/Documents/17214.pdf")),
+ stdout = outfile)
> tags <- c("Title", "Subject", "Keywords", "Author", "Creator",
+ "Producer", "CreationDate", "ModDate", "Tagged", "Form",
+ "Pages", "Encrypted", "Page size", "File size", "Optimized",
+ "PDF version")
> re <- sprintf("^(%s)", paste(sprintf("%-16s", sprintf("%s:",
+ tags)), collapse = "|"))
> lines <- readLines(outfile, warn = FALSE)
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file 'C:\Users\Raffael\AppData\Local\Temp\RtmpquRYX6\pdfinfo8d419174450': No such file or direc
Apparently tempfile() simply doesn't create a file.
> outfile <- tempfile("pdfinfo")
> outfile
[1] "C:\\Users\\Raffael\\AppData\\Local\\Temp\\RtmpquRYX6\\pdfinfo8d437bd65d9"
The folder C:\Users\Raffael\AppData\Local\Temp\RtmpquRYX6 exists and holds some files but none is named pdfinfo8d437bd65d9.
Intersting, on my machine after a fresh start pdf is a function to convert an image to a PDF:
getAnywhere(pdf)
A single object matching ‘pdf’ was found
It was found in the following places
package:grDevices
namespace:grDevices [etc.]
But back to the problem of reading in PDF files as text, fiddling with the PATH is a bit hit-and-miss (and annoying if you work across several different computers), so I think the simplest and safest method is to call pdf2text using system as Tony Breyal describes here.
In your case it would be (note the two sets of quotes):
system(paste('"C:/Program Files/xpdf64/pdftotext.exe"',
'"C:/Users/Raffael/Documents/17214.pdf"'), wait=FALSE)
This could easily be extended with an *apply function or loop if you have many PDF files.