Creating a loop to use read.eset in bioconductor - r

I would like to create a loop to load this files through read.esetof bioconductor.
I tried that:
for(k in 1:29){
expr <- paste0("/home/proj/MT_Nellore/R/eBrowser/Adjusted/LRRadjustedextremes0.5kgchr",k,".txt")
pdat <- paste0("/home/proj/MT_Nellore/R/eBrowser/Adjusted/Samplesbinary0.5.txt")
ffdat <- paste0("/home/proj/MT_Nellore/R/LRR/Chr_adjusted/probeslabeladjustedchr",k,".txt")
eset <- read.eset(exprs.file="expr", pdat.file="/home/proj/MT_Nellore/R/eBrowser/Adjusted/Samplesbinary0.5.txt", fdat.file="ffdat")
}
However I get this error:
## Error in file(file, "r") : cannot open the connection
## In addition: Warning message:
## In file(file, "r") : cannot open file 'ffdat': No such file or directory
Any suggestions?

Ah - just spotted the error - you must remove quotes from around the "ffdat" on the final line, and same for the "expr"

Related

Error in file(file, "rt") : cannot open the connection. This is the error that pops up when I try to run this code

tripdata_2020_05 <- read.csv("../input/Excel/202005-divvy-tripdata.csv")
tripdata_2020_06 <- read.csv("../input/Excel/202006-divvy-tripdata.csv")
tripdata_2020_07 <- read.csv("../input/Excel/202007-divvy-tripdata.csv")
tripdata_2020_08 <- read.csv("../input/Excel/202008-divvy-tripdata.csv")
tripdata_2020_09 <- read.csv("../input/Excel/202009-divvy-tripdata.csv")
tripdata_2020_10 <- read.csv("../input/Excel/202010-divvy-tripdata.csv")
tripdata_2020_11 <- read.csv("../input/Excel/202011-divvy-tripdata.csv")
tripdata_2020_12 <- read.csv("../input/Excel/202012-divvy-tripdata.csv")
tripdata_2021_01 <- read.csv("../input/Excel/202101-divvy-tripdata.csv")
tripdata_2021_02 <- read.csv("../input/Excel/202102-divvy-tripdata.csv")
tripdata_2021_03 <- read.csv("../input/Excel/202103-divvy-tripdata.csv")
tripdata_2021_04 <- read.csv("../input/Excel/202104-divvy-tripdata.csv")
I'm trying to load the data set and when I run it this pops up
Warning in file(file, "rt") :
cannot open file '../input/Excel/202006-divvy-tripdata.csv': No such file or directory
Error in file(file, "rt") : cannot open the connection

How skip some line in R

I have many URLs which I import their text in R.
I use this code:
setNames(lapply(1:1000, function(x) gettxt(get(paste0("url", x)))), paste0("url", 1:1000, "_txt")) %>%
list2env(envir = globalenv())
However, some URLs can not import and show this error:
Error in file(con, "r") : cannot open the connection In addition:
Warning message: In file(con, "r") : InternetOpenUrl failed: 'A
connection with the server could not be established'
So, my code doesn't run and doesn't import any text from any URL.
How can I recognize wrong URLs and skip them in other to import correct URLs?
one possible aproach besides trycatch mentioned by #tester can be the purrr-package:
library(purrr)
# declare function
my_gettxt <- function(x) {
gettxt(get(paste0("url", x)))
}
# make function error prone by defining the otherwise value (could be empty df with column defintion, etc.) used as output if function fails
my_gettxt <- purrr::possibly(my_gettxt , otherwise = NA)
# use map from purrr instead of apply function
my_data <- purrr::map(1:1000, ~my_gettxt(.x))

How to open a .pre file in R?

I am wondering how to open a .pre file in R. I can open the file in notepad, and see it clearly on Windows.
I also have an object called "newfiles" that lists many .pre files, but when I try to pull these files into R, I get the error message below.
Here is the code I have for my files:
newfiles <- dir("~/Desktop/_preFiles_byGrid")
> newfile
[1] "262778 _PRISM.pre"
> head(newfiles)
[1] "262778 _PRISM.pre" "262779 _PRISM.pre" "262780 _PRISM.pre" "262781 _PRISM.pre" "262782 _PRISM.pre" "262783 _PRISM.pre"
for (newfile in newfiles) {
n <- read.table(file.path("_preFiles_byGrid", newfile), sep=",", as.is=TRUE, header=FALSE)
}
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file '_preFiles_byGrid/262778 _PRISM.pre': No such file or directory
If you do
newfiles <- dir("~/Desktop/_preFiles_byGrid", full.names=TRUE)
Then you can just do
n <- read.table(newfile, sep=",", as.is=TRUE, header=FALSE)
in your loop without having to worry about rebuilding the path with file.path() and you are much less likely to get missing file errors this way.

Using read.arff() function in R and importing .arff files

I am trying to import this dataset of .arff type
file_location <- file.path("/Users","supreet","Downloads","Chronic_Kidney_Disease1/")
Chronic_Kidney_Disease <- read.arff(paste(file_location,"chronic_kidney_disease.arff",sep=""))
But it is throwing the following error
Error in file(arff_file, "rb") : cannot open the connection In
addition: Warning message: In file(arff_file, "rb") : cannot open
file
'/Users/supreet/Downloads/Chronic_Kidney_Disease1/chronic_kidney_disease.arff.arff':
No such file or directory
Also, if remove .arff extension as it is already appended :
file_location <- file.path("/Users","supreet","Downloads","Chronic_Kidney_Disease1/")
Chronic_Kidney_Disease <- read.arff(paste(file_location,"chronic_kidney_disease",sep=""))
I get this error:
Error: XML content does not seem to be XML:
'/Users/supreet/Downloads/Chronic_Kidney_Disease1/chronic_kidney_disease.xml'
In addition: Warning message: In matrix(unlist(strsplit(arff_data,
",", fixed = T)), ncol = num_attrs, : data length [10001] is not a
sub-multiple or multiple of the number of rows [401]
>

Error trying to read a PDF using readPDF from the tm package

(Windows 7 / R version 3.0.1)
Below the commands and the resulting error:
> library(tm)
> pdf <- readPDF(PdftotextOptions = "-layout")
> dat <- pdf(elem = list(uri = "17214.pdf"), language="de", id="id1")
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file 'C:\Users\Raffael\AppData\Local\Temp
\RtmpS8Uql1\pdfinfo167c2bc159f8': No such file or directory
How do I solve this issue?
EDIT I
(As suggested by Ben and described here)
I downloaded Xpdf copied the 32bit version to
C:\Program Files (x86)\xpdf32
and the 64bit version to
C:\Program Files\xpdf64
The environment variables pdfinfo and pdftotext are referring to the respective executables either 32bit (tested with R 32bit) or to 64bit (tested with R 64bit)
EDIT II
One very confusing observation is that starting from a fresh session (tm not loaded) the last command alone will produce the error:
> dat <- pdf(elem = list(uri = "17214.pdf"), language="de", id="id1")
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file 'C:\Users\Raffael\AppData\Local\Temp\RtmpKi5GnL
\pdfinfode8283c422f': No such file or directory
I don't understand this at all because the function variable is not defined by tm.readPDF yet. Below you'll find the function pdf refers to "naturally" and to what is returned by tm.readPDF:
> pdf
function (elem, language, id)
{
meta <- tm:::pdfinfo(elem$uri)
content <- system2("pdftotext", c(PdftotextOptions, shQuote(elem$uri),
"-"), stdout = TRUE)
PlainTextDocument(content, meta$Author, meta$CreationDate,
meta$Subject, meta$Title, id, meta$Creator, language)
}
<environment: 0x0674bd8c>
> library(tm)
> pdf <- readPDF(PdftotextOptions = "-layout")
> pdf
function (elem, language, id)
{
meta <- tm:::pdfinfo(elem$uri)
content <- system2("pdftotext", c(PdftotextOptions, shQuote(elem$uri),
"-"), stdout = TRUE)
PlainTextDocument(content, meta$Author, meta$CreationDate,
meta$Subject, meta$Title, id, meta$Creator, language)
}
<environment: 0x0c3d7364>
Apparently there is no difference - then why use readPDF at all?
EDIT III
The pdf file is located here: C:\Users\Raffael\Documents
> getwd()
[1] "C:/Users/Raffael/Documents"
EDIT IV
First instruction in pdf() is a call to tm:::pdfinfo() - and there the error is caused within the first few lines:
> outfile <- tempfile("pdfinfo")
> on.exit(unlink(outfile))
> status <- system2("pdfinfo", shQuote(normalizePath("C:/Users/Raffael/Documents/17214.pdf")),
+ stdout = outfile)
> tags <- c("Title", "Subject", "Keywords", "Author", "Creator",
+ "Producer", "CreationDate", "ModDate", "Tagged", "Form",
+ "Pages", "Encrypted", "Page size", "File size", "Optimized",
+ "PDF version")
> re <- sprintf("^(%s)", paste(sprintf("%-16s", sprintf("%s:",
+ tags)), collapse = "|"))
> lines <- readLines(outfile, warn = FALSE)
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file 'C:\Users\Raffael\AppData\Local\Temp\RtmpquRYX6\pdfinfo8d419174450': No such file or direc
Apparently tempfile() simply doesn't create a file.
> outfile <- tempfile("pdfinfo")
> outfile
[1] "C:\\Users\\Raffael\\AppData\\Local\\Temp\\RtmpquRYX6\\pdfinfo8d437bd65d9"
The folder C:\Users\Raffael\AppData\Local\Temp\RtmpquRYX6 exists and holds some files but none is named pdfinfo8d437bd65d9.
Intersting, on my machine after a fresh start pdf is a function to convert an image to a PDF:
getAnywhere(pdf)
A single object matching ‘pdf’ was found
It was found in the following places
package:grDevices
namespace:grDevices [etc.]
But back to the problem of reading in PDF files as text, fiddling with the PATH is a bit hit-and-miss (and annoying if you work across several different computers), so I think the simplest and safest method is to call pdf2text using system as Tony Breyal describes here.
In your case it would be (note the two sets of quotes):
system(paste('"C:/Program Files/xpdf64/pdftotext.exe"',
'"C:/Users/Raffael/Documents/17214.pdf"'), wait=FALSE)
This could easily be extended with an *apply function or loop if you have many PDF files.

Resources