I am following this tool set called IUTA that needs to create a vector of the character of paths.
The example in the program shows that I have to do the following:
## set the paths for the BAM file and GTF file
## notice that the gtf file contains correct gene_id information
bam.list.1<-system.file("bamdata",paste("sample_",1:3,".bam",sep=""),
package="IUTA")
bam.list.2<-system.file("bamdata",paste("sample_",4:6,".bam",sep=""),
package="IUTA")
transcript.info<-system.file("gtf","mm10_kg_sample_IUTA.gtf",
package="IUTA")
Ok I tried that, but its not creating moving the path for all of my bam files nor my gtf file.
I get the following error, when I try to run the program as it can not find nor open my bam files.
IUTA(bam.list.1, bam.list.2, transcript.info,
+ rep.info.1 = rep(1, length(bam.list.1)),
+ rep.info.2 = rep(1, length(bam.list.2)),
+ output.dir = paste(getwd(), "/IUTA", sep = ""),
+ output.na = TRUE,
+ genes.interested = "all")
All 8 cores on the machine will be used.
Preparing indices of bam files ......
open: No such file or directory
Error in FUN(X[[i]], ...) : failed to open SAM/BAM file
file: ''
I was wondering what exactly, I am doing wrong. Also I am working from an external hard drive and I am in that particular directory as well.
Many thanks.
http://www.niehs.nih.gov/research/resources/assets/docs/iuta_pdf_manual_508.pdf
Related
I have plenty of zip files and I want to load only the ones that met the name condition
for example, unzip any file that has a name like this "Query Transaction History_20221122"
I was able to achieve that with the script below
zip_files <-list.files(path ="C:/Users/Guest 1/Downloads",
pattern =".*Query Transaction History_20221122.*zip",full.names = TRUE )
Now I want to extract to the specified folder with the code below using the plyr package
ldply(.data = zip_files,.fun = unzip,exdir =my_dir )
and it extracts fine to the specified folder with no issue
The problem now is that the name of the folder is alphanumeric, which means that it comes with a name and also a date that is formatted as numeric please see the sample below
Query Transaction History_20221122
since it's something I will keep doing on a daily basis, I want to write a code that periodically changes the numeric part of the zip file name.
I tried using glue from the glue package see the sample below
checks<-format(Sys.Date(),"%Y%m%d")
zip_files <-list.files(path ="C:/Users/Guest 1/Downloads",
pattern =glue(".*Query Transaction History_{checks}.*zip",full.names = TRUE ))
it run fine but when I tried to extract the file using the second script
ldply(.data = zip_files,.fun = unzip,exdir =my_dir )
it then returned the error below
In addition: Warning message:
In FUN(X[[i]], ...) : error 1 in extracting from zip file
Kindly assist
Thank you
I am running a script for some batch copying/renaming operations using R. The script is executed using a .bat file for executing using command prompt in MS Windows. Note that the destination computer where this will be run does not allow external connections (internet of otherwise) to install new packages so the solution should be in base R.
I can print comments to screen using cat and the output shows up in the .Rout file generated after the script is run. The .Rout file gets overwritten each time the script is executed and I would like to create a separate log file instead.
Here's the relevant of code:
if(copy_files){
# tried different modes for the file statement below
log_con <- file(paste(format(Sys.time(), '%d-%b-%Y %H:%M:%S'),
'move duplicates.txt'),
open = 'at')
cat('Parent directory:\t\t' , file = log_con, append= F)
cat(parent_folder , file = log_con, append= T)
cat('\nSubfolder to copy files to:\t ', file = log_con, append= T)
cat(subfolder_old, file = log_con, append= T)
cat('\n\n', file = log_con, append= T)
# copying operations here - omitted for brevity
}
Using only the cat statements without the file and append argument works fine but the above code returns the following error messages:
> log_con <- file(paste(format(Sys.time(), '%d-%b-%Y %H:%M:%S'), 'move duplicates.txt'), open = 'at')
Error in file(paste(format(Sys.time(), "%d-%b-%Y %H:%M:%S"), "move duplicates.txt"), :
cannot open the connection
In addition: Warning message:
In file(paste(format(Sys.time(), "%d-%b-%Y %H:%M:%S"), "move duplicates.txt"), :
cannot open file '07-Aug-2018 15:50:36 move duplicates.txt': Invalid argument
> cat('Parent directory:\t\t' , file = log_con, append= F)
Error in cat("Parent directory:\t\t", file = log_con, append = F) :
cannot open the connection
In addition: Warning message:
In cat("Parent directory:\t\t", file = log_con, append = F) :
cannot open file '07-Aug-2018 15:48:11 move duplicates.txt': Invalid argument
From what I understand the error stems from the fact that the log file does not exist at the beginning and the connection can not be opened.
Looking at the documentation and the answer for How to create periodically send text to a "log file" while printing normal output to console? seems to suggest that including append = F in the first cat statement ought to work. I have tried versions where different/no mode is specified for file command with the same result. Answers for Add lines to a file seem to suggest the same. Am I missing something?
I could create a file and have R append lines to it each time but I want a unique log for each time the script is run.
one way to solve this is to use sink, at the start of the code put sink("R_code.log") once the whole process is done put sink() at the last line of code .I hope this will solve your problem and if you want to name it dynamically use paste0 inside sink function.
I'm new to R studio and was not well aware of this portal T&C, so was blocked for questing for 5 days.
I have a code for importing multiple files from any directory to R.
Using this code for doing so, but the problem is this code runs sometime and sometime it gets failed with mentioned error.
I tried to found the solution of this but yet not found any solution.
library(data.table)
t = setwd("/home/dp/vishan/olp_data/19164/1/")
files <- file.info(list.files(path = t,pattern = "", full.names=TRUE))
files = rownames(files)[files$size > 0]
temp <- lapply(files, fread, sep=",")
Error:
Error in FUN(X[[i]], ...) :
'input' can not be a directory name, but must be a single character string containing a file name, a command, full path to a file, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or the input data itself.
Thanks in advance!
try using
files <- file.info(list.files(path = t,pattern = "", full.names=TRUE))
files <- subset(files, !isdir & size > 0)
temp <- lapply(rownames(files), fread, sep=',')
since list.files also shows directories. The data.frame you create in files can be easily subset on the isdir column which indicates if this is a directory or a file.
I am trying and failing to write a process that will download a .zip archive, extract a particular Excel file from that archive, and load that Excel file into my R workspace without ever writing any of those files (the .zip or the .xls) to my hard drive.
I have written a version of this process that works for zipped .csvs, but it doesn't work for .xls. Here's how that version goes, using one of the URLs I'm targeting in my current project and using readWorksheetFromFile() instead of read.csv() at the appropriate moment:
library(XLConnect)
waed.old.link <- "http://eventdata.parusanalytics.com/data.dir/pitf.world.19950101-20121231.xls.zip"
waed.old.file <- "pitf.world.19950101-20121231.xls"
tmp <- tempfile()
download.file(waed.old.link, tmp)
tmp2 <- tempfile()
tmp2 <- unz(tmp, waed.old.file)
WAED.old <- readWorksheetFromFile(tmp2, sheet = 1, startRow = 3, startCol = 1, endCol = 73)
unlink(tmp)
unlink(tmp2)
And here's what pops up after line 8, the one that tries to ingest the spreadsheet as WAED.old:
Error in path.expand(filename) : invalid 'path' argument
I also tried read_excel() at that step and got the same result:
> WAED.old <- read_excel(tmp2, skip = 2)
Error in file.exists(path) : invalid 'file' argument
I gather that this has something to do with pointing readWorksheetFromFile() at a connection rather than a file, but I'm not sure that's right, and I don't know how to fix it if it is. I searched stackoverflow and the web for an answer but couldn't find one that was right on point. I'd really appreciate some help.
As you say, it is because unz returns a connection object for the file within the zip (but does not explicitly unzip that file), while readWorksheetFromFile expects a path to a file.
Use unzip to explicitly unzip the file.
tmp2 <- unzip(zipfile=tmp, files = waed.old.file, exdir=tempdir())
# readWorksheetFromFile(tmp2, ...)
I'm using unz to extract data from a file within an archive. This actually works pretty well but unfortunately I've a lot of zip files and need to check the existence of a specific file within the archive. I could not manage to get a working solution with if exists or else.
Has anyone an idea how to perform a check if a file exists in an archive without extracting the whole archive before?
Example:
read.table(unz(D:/Data/Test.zip, "data.csv"), sep = ";")[-1,]
This works pretty well if data.csv exists but gives an error if the file is not available in the archive Test.zip.
Error in open.connection(file, "rt") : cannot open the connection
In addition: Warning message:
In open.connection(file, "rt") :
cannot locate file 'data.csv' in zip file 'D:/Data/Test.zip'
Any comments are welcome!
You could use unzip(file, list = TRUE)$Name to get the names of the files in the zip without having to unzip it. Then you can check to see if the files you need are in the list.
## character vector of all file names in the zip
fileNames <- unzip("D:/Data/Test.zip", list = TRUE)$Name
## check if any of those are 'data.csv' (or others)
check <- basename(fileNames) %in% "data.csv"
## extract only the matching files
if(any(check)) {
unzip("D:/Data/Test.zip", files = fileNames[check], junkpaths = TRUE)
}
You could probably put another if() statement to run unz() in cases where there is only one matched file name, since it's faster than running unzip() on a single file.