Combine columns from multiple files - r

I have written a code that should combine multiple files into one file by combining the second column in the input files. The first column is similar across input files. However, it gives an error that I cannot understand.
files <- list.files(path = "/Rfam/",pattern='\\.sam')
My code
lst <- lapply(files, function(x) read.csv(x,header=TRUE))
setNames(Reduce(function(...) merge(..., by='V1'),
lst),c('ID', paste0('file',seq_along(files))) )
The error
> lst <- lapply(files, function(x) read.csv(x,header=TRUE))
Show Traceback
Rerun with Debug
Error in file(file, "rt") : cannot open the connection In addition: Warning message:
In file(file, "rt") :
cannot open file 'Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput100G.sam': No such file or directory
My files:
> head(files)
[1] "Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput100G.sam"
[2] "Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput100R.sam"
[3] "Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput106G.sam"
[4] "Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput106R.sam"
[5] "Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput122G.sam"
[6] "Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput122R.sam"
> length(files)
[1] 96
Example of input
DMED7013:Rfam robinm$ head Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput402R.sam
Seq_../trimmed/402R.tally.fasta __not_aligned
__too_low_aQual 3
mir-10 5
Y_RNA 4
__too_low_aQual 0
__too_low_aQual 0
__not_aligned 1
mir-8 2
mir-671 3
mir-671 16

Related

Error in file(file, "rt") - Coursera R Programming Week 2 Assignment 1

I am doing the following assignment:
[Coursera Air pollution Assignment][1]
[1]: https://i.stack.imgur.com/QAcMG.png
After doing dir.create("specdata") and unzipping the file, all 332 files went to the "specdata" directory. So I did the function:
pollutantmean <- function(directory, pollutant, id = 1:332) {
lista <- list.files("C:/Users/Ana/Desktop/Temporario/specdata", pattern = "*.csv")
for(i in id) {
dados <- read.csv(lista[i])
valor <- numeric(dados[pollutant])
}
mean(valor, na.rm = TRUE)
}
And as I tested it with
pollutantmean("specdata", "sulfate", 1:10)
I got the error message:
Error in file(file, "rt") : not possible to open the connection
Warning message: In file(file, "rt") :
Could anyone help? When I list.files they all appear (001.csv, ..., 332.csv), and my working directory is the parent environment to "specdata".

How to read IMF xls- or sdmx-data from url?

From the IMF I want to read a .xls file from an URL directly into R, but all attempts fail so far. Weirdly, I can download the file manually or by download.file() and open it without problems in Microsoft Outlook or in a text editor. However, even then I can't read the data into R.
I always try with both https and http.
myUrl <- "https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls"
myUrl2 <- "http://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls"
1. Classic approach – fails.
imf <- read.table(file=myUrl, sep="\t", header=TRUE)
# Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
# line 51 did not have 55 elements
imf <- read.table(file=url(myUrl), sep="\t", header=TRUE)
# Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
# line 51 did not have 55 elements
2. Several packages – fails.
imf <- readxl::read_xls(myUrl)
# Error: `path` does not exist: ‘https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls’
imf <- readxl::read_xls(myUrl2)
# Error: `path` does not exist: ‘http://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls’
imf <- gdata::read.xls(myUrl)
# Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, :
# Intermediate file 'C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f873be18e0.csv' missing!
# In addition: Warning message:
# In system(cmd, intern = !verbose) :
# running command '"C:\STRAWB~1\perl\bin\perl.exe"
# "C:/Program Files/R/R-3.6.1rc/library/gdata/perl/xls2csv.pl"
# "https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls"
# "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f873be18e0.csv" "1"' had status 2
# Error in file.exists(tfn) : invalid 'file' argument
imf <- gdata::read.xls(myUrl2) # <---------------------------------------------- THIS DOWNLOADS SOMETHING AT LEAST!
# trying URL 'http://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls'
# Content type 'application/vnd.ms-excel' length unknown
# downloaded 8.9 MB
#
# Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, :
# Intermediate file 'C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f87ded406b.csv' missing!
# In addition: Warning message:
# In system(cmd, intern = !verbose) :
# running command '"C:\STRAWB~1\perl\bin\perl.exe"
# "C:/Program Files/R/R-3.6.1rc/library/gdata/perl/xls2csv.pl"
# "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f87f532cb3.xls"
# "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f87ded406b.csv" "1"' had status 255
# Error in file.exists(tfn) : invalid 'file' argument
3. Tempfile approach – fails.
temp <- tempfile()
download.file(myUrl, temp) # THIS WORKS...
## BUT...
imf <- gdata::read.xls(temp)
# Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, :
# Intermediate file 'C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f870f55e04.csv' missing!
# In addition: Warning message:
# In system(cmd, intern = !verbose) :
# running command '"C:\STRAWB~1\perl\bin\perl.exe"
# "C:/Program Files/R/R-3.6.1rc/library/gdata/perl/xls2csv.pl"
# "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f8746a46db"
# "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f870f55e04.csv" "1"' had status 255
# Error in file.exists(tfn) : invalid 'file' argument
# even not...
tmp1 <- readLines(temp)
# Warning message:
# In readLines(temp) :
# incomplete final line found on
# 'C:\Users\jay\AppData\Local\Temp\Rtmp00GPlq\file2334435c2905'
str(tmp1)
# chr [1:8733] "WEO Country Code\tISO\tWEO Subject Code\tCountry\tSubject
# Descriptor\tSubject Notes\tUnits\tScale\tCountry/Seri"| __truncated__ ...
4. SDMX
I also tried the SDMX the IMF offer, but also without success. Probably this would be a more sophisticated approach, but I never used SDMX.
link <- "https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019_SDMXData.zip"
temp <- tempfile()
download.file(link, temp, quiet=TRUE)
imf <- rsdmx::readSDMX(temp)
# Error in function (type, msg, asError = TRUE) :
# Could not resolve host: C
# imf <- rsdmx::readSDMX(unzip(temp)) # runs forever and crashes R
unlink(temp)
Now... does anybody know what's going on, and how I may load the data into R?
Why not just use fill=TRUE?
imf <- read.table(file=myUrl, sep="\t", header=TRUE, fill = TRUE)
from ?read.table
fill
logical. If TRUE then in case the rows have unequal length, blank fields are implicitly added. See ‘Details’.

How to open a .pre file in R?

I am wondering how to open a .pre file in R. I can open the file in notepad, and see it clearly on Windows.
I also have an object called "newfiles" that lists many .pre files, but when I try to pull these files into R, I get the error message below.
Here is the code I have for my files:
newfiles <- dir("~/Desktop/_preFiles_byGrid")
> newfile
[1] "262778 _PRISM.pre"
> head(newfiles)
[1] "262778 _PRISM.pre" "262779 _PRISM.pre" "262780 _PRISM.pre" "262781 _PRISM.pre" "262782 _PRISM.pre" "262783 _PRISM.pre"
for (newfile in newfiles) {
n <- read.table(file.path("_preFiles_byGrid", newfile), sep=",", as.is=TRUE, header=FALSE)
}
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file '_preFiles_byGrid/262778 _PRISM.pre': No such file or directory
If you do
newfiles <- dir("~/Desktop/_preFiles_byGrid", full.names=TRUE)
Then you can just do
n <- read.table(newfile, sep=",", as.is=TRUE, header=FALSE)
in your loop without having to worry about rebuilding the path with file.path() and you are much less likely to get missing file errors this way.

R: trouble assigning values to a dynamic variable in a dataframe

I am trying to assign values to a dataframe variable defined by the user. The user specifies the name of the variable, let's call this x, in the dataframe df. For simplicity I want to assign a value of 3 to everything in the column the user specifies. The simplified code is:
variableName <- paste("df$", x, sep="")
eval(parse(text=variableName)) <- 3
But I get an error:
Error in file(filename, "r") : cannot open the connection
In addition: Warning message:
In file(filename, "r") :
cannot open file 'df$x': No such file or directory
I've tried all kinds of remedies to no avail. If I simply try to print the values of the column.
eval(parse(text=variableName))
I get no errors and it prints out ok. It's only when I try to give that column a value that I get the error. Any help would be appreciated.
I believe the issue is that there is no way to use the result of eval() on the LHS of an assignment.
df = data.frame(foo = 1:5,
bar = -3)
x = "bar"
variableName <- paste("df$", x, sep="")
eval(parse(text=variableName)) <- 3
#> Warning in file(filename, "r"): cannot open file 'df$bar': No such file or
#> directory
#> Error in file(filename, "r"): cannot open the connection
## This error is a bit misleading. Breaking it apart I get a different error.
eval(expression(df$bar)) <- 3
#> Error in eval(expression(df$bar)) <- 3: could not find function "eval<-"
## And it works if you put it all in the string to be parsed.
ex1 <- paste0("df$", x, "<-3")
eval(parse(text=ex1))
df
#> foo bar
#> 1 1 3
#> 2 2 3
#> 3 3 3
#> 4 4 3
#> 5 5 3
## But I doubt that's the best way to do it!

Creating a loop to use read.eset in bioconductor

I would like to create a loop to load this files through read.esetof bioconductor.
I tried that:
for(k in 1:29){
expr <- paste0("/home/proj/MT_Nellore/R/eBrowser/Adjusted/LRRadjustedextremes0.5kgchr",k,".txt")
pdat <- paste0("/home/proj/MT_Nellore/R/eBrowser/Adjusted/Samplesbinary0.5.txt")
ffdat <- paste0("/home/proj/MT_Nellore/R/LRR/Chr_adjusted/probeslabeladjustedchr",k,".txt")
eset <- read.eset(exprs.file="expr", pdat.file="/home/proj/MT_Nellore/R/eBrowser/Adjusted/Samplesbinary0.5.txt", fdat.file="ffdat")
}
However I get this error:
## Error in file(file, "r") : cannot open the connection
## In addition: Warning message:
## In file(file, "r") : cannot open file 'ffdat': No such file or directory
Any suggestions?
Ah - just spotted the error - you must remove quotes from around the "ffdat" on the final line, and same for the "expr"

Resources