Copy files which matches requirement to the new folder - r

I have original folder with 2000 files (patha), now I want to copy only the file which match my requirement (list in grdc_no) to new path (pathb). Here is my performance:
grdc_no <- grdc$grdc_no
# list of file name satisfied with my requirement
all_files <- list.files("patha", full.names = TRUE)
for (f in all_files) {
for (i in 1:length(grdc_no)) {
if (f == grdc_no[i]) {
file.copy(f, "pathb")
} else {}
}
}
However, it does not work. Any advice for me in this case? Many thanks

You can easily do this without a loop (and especially a nested one) using lapply:
lapply(all_files[basename(all_files) %in% grdc_no],function(x) file.copy(x,"pathb"))
This will index files from all_files with a matching filename in the vector grdc_no and apply file.copy to it.

Related

source file specified as string in R

I want to, programmatically, source all .R files contained within a given array retrieved with the Sys.glob() function.
This is the code I wrote:
# fetch the different ETL parts
parts <- Sys.glob("scratch/*.R")
if (length(parts) > 0) {
for (part in parts) {
# source the ETL part
source(part)
# rest of code goes here
# ...
}
} else {
stop("no ETL parts found (no data to process)")
}
The problem I have is I cannot do this or, at least, I get the following error:
simpleError in source(part): scratch/foo.bar.com-https.R:4:151: unexpected string constant
I've tried different combinations for the source() function like the following:
source(sprintf("./%s", part))
source(toString(part))
source(file = part)
source(file = sprintf("./%s", part))
source(file = toString(part))
No luck. As I'm globbing the contents of a directory I need to tell R to source those files. As it's a custom-tailored ETL (extract, transform and load) script, I can manually write:
source("scratch/foo.bar.com-https.R")
source("scratch/bar.bar.com-https.R")
source("scratch/baz.bar.com-https.R")
But that's dirty and right now there are 3 different extraction patterns. They could be 8, 80 or even 2000 different patterns so writing it by hand is not an option.
How can I do this?
Try getting the list of files with dir and then using lapply:
For example, if your files are of the form t1.R, t2.R, etc., and are inside the path "StackOverflow" do:
d = dir(pattern = "^t\\d.R$", path = "StackOverflow/", recursive = T, full.names = T)
m = lapply(d, source)
The option recursive = T will search all subdirectories, and full.names = T will add the path to the filenames.
If you still want to use Sys.glob(), this works too:
d = Sys.glob(paths = "StackOverflow/t*.R")
m = lapply(d, source)

Conditionally process (bgzip, tabix) files using loop and if else statement

I have some .vcf files. I have selected those files from my directory and want to convert them to two other formats.
I am a bit confused using if and else if here. I want to do it like this: if there isn't .bgz file for [i]th .vcf file, I want to convert it to .bgz file keeping the original file.
If there is already .bgz file, but not .bgz.tbi file for [i] th .bgz file, then I want to convert .bgz file to .bgz.tbi file keeping the original .bgz that I get from .vcf file.
Can someone please help me finish this loop? It works for if condition, but don't know how to proceed from there.
path.file<-"/mypath/for/files/"
all.files <- list.files("/mypath/for/files")
all.files <- all.files[grepl(".vcf$",all.files)]
for (i in 1:length(all.files)){
if(!exists(paste0(all.files[i],".bgz"))){
bgzip(paste0(path.file,all.files[i]), overwrite=FALSE)
}else{(!exists(paste0(all.files[i],".bgz",".tbi"))){
#if(!exists(paste0(all.files[i],".bgz",".tbi"))){
indexTabix(paste0(paste0(path.file,all.files[i]),".bgz"), format="vcf")
}
}
Try this (not tested):
#get VCF files with path
all.files <- list.files("/mypath/for/files", pattern = "*.vcf$",
full.names = TRUE)
for (i in all.files) {
#make output names, so we don't mess about with paste
file_bgz <- paste0(i, ".bgz")
file_bgz_tbi <- paste0(i, ".bgz.tbi")
#if bgz exists don't zip else zip
if(!exists(file_bgz))
bgzip(i, paste0(i, ".bgz"))
#if tbi exists don't index else tabix
if(!exists(file_bgz_tbi))
indexTabix(file_bgz, format = "vcf")
}

file.copy in R not working

I am using the command file.copy in R and it throws an error, but I can't spot the reason.
file.copy(from="Z:/Ongoing/Test", to = "C:/Users/Darius/Desktop", overwrite = TRUE, recursive = TRUE)
Warning message:
In file.copy(from = "Z:/Ongoing/Test",:
problem copying Z:/Ongoing/Test to C:/Users/Darius/Desktop/Test: No such file or directory
Can anyone see the problem? The command line doesn't work even though it only gives you a warning message.
Actually, I don't think there is any straight forward way to copy a directory. I have written a function which might help you.
This function takes input two arguments:
from: The complete path of directory to be copied
to: The location to which the directory is to be copied
Assumption: from and to are paths of only one directory.
dir.copy <- function(from, to){
## check if from and to directories are valid
if (!dir.exists(from)){
cat('from: No such Directory\n')
return (FALSE)
}
else if (!dir.exists(to)){
cat('to: No such Directory\n')
return (FALSE)
}
## extract the directory name from 'from'
split_ans <- unlist(strsplit(from,'/'))
dir_name <- split_ans[length(split_ans)]
new_to <- paste(to,dir_name,sep='/')
## create the directory in 'to'
dir.create(new_to)
## copy all files in 'to'
file_inside <- list.files(from,full.names = T)
file.copy(from = file_inside,to=new_to)
## copy all subdirectories
dir_inside <- list.dirs(path=from,recursive = F)
if (length(dir_inside) > 0){
for (dir_name in dir_inside)
dir.copy(dir_name,new_to)
}
return (TRUE)
}
The file.copy() doesn't create directories. So it'll only work if you're copying to folders that already exist.
Had similar issue:
This blog was helpful. Slightly modified the code by adding full.names=T and overwrite = T.
current.folder <- "E:/ProjectDirectory/Data/"
new.folder <- "E:/ProjectDirectory/NewData/"
list.of.files <- list.files(current.folder, full.names = T)
# copy the files to the new folder
file.copy(list.of.files, new.folder, overwrite = T)

How to read all the files in a folder using R and create objects with the same file names?

I need to create a function in R that reads all the files in a folder (let's assume that all files are tables in tab delimited format) and create objects with same names in global environment. I did something similar to this (see code below); I was able to write a function that reads all the files in the folder, makes some changes in the first column of each file and writes it back in to the folder. But the I couldn't find how to assign the read files in to an object that will stay in the global environment.
changeCol1 <- function () {
filesInfolder <- list.files()
for (i in 1:length(filesInfolder)){
wrkngFile <- read.table(filesInfolder[i])
wrkngFile[,1] <- gsub(0,1,wrkngFile[,1])
write.table(wrkngFile, file = filesInfolder[i], quote = F, sep = "\t")
}
}
You are much better off assigning them all to elements of a named list (and it's pretty easy to do, too):
changeCol1 <- function () {
filesInfolder <- list.files()
lapply(filesInfolder, function(fname) {
wrkngFile <- read.table(fname)
wrkngFile[,1] <- gsub(0, 1, wrkngFile[,1])
write.table(wrkngFile, file=fname, quote=FALSE, sep="\t")
wrkngFile
}) -> data
names(data) <- filesInfolder
data
}
a_list_full_of_data <- changeCol1()
Also, F will come back to haunt you some day (it's not protected where FALSE and TRUE are).
add this to your loop after making the changes:
assign(filesInfolder[i], wrkngFile, envir=globalenv())
If you want to put them into a list, one way would be, outside your loop, declare a list:
mylist = list()
Then, within your loop, do like so:
mylist[[filesInfolder[i] = wrkngFile]]
And then you can access each object by looking at:
mylist[[filename]]
from the global env.

Scan for string across multiple files with R

I would like to scan multiple files for strings in R and know which file names have that string.
Is there a way to do this with something like grep, cat, readLines in a function maybe?
If I scan the files using:
fileNames <- Sys.glob("*.csv")
then maybe something like:
for (f in fileNames) {
stuff <- read.csv(fileName, sep = ",")
grep("string")
}
names(res) <- substr(filenames, 1, 30)
Or maybe even better, a loop like this:
for( f in filenames ){
cat("string", file=f)
}
for( f in filenames) {
cat(readLines(f), sep="\n")
}
This code doesn't work, I'm just trying to think this through. Im certain there is a better way to do this. It sounds simple but I cant get it right.
I want to scan files for strings and then have the output of the filenames where the string was found. I have not found an example to do this in R.
Suggestions?
note that in your first code example you use f as a loop variable while inside the loop you use fileName instead (also R is case sensitive so fileNames and filenames are different objects).
if it's unlikely that your search string contains the CSV delimiter, you can indeed use readLines(..) together with grep(..). grep(..) then returns a list of line numbers where the string occurs. Try the following code:
fileNames <- Sys.glob("*.csv")
for (fileName in fileNames) {
if (length(grep("string", readLines(fileName))) > 0) { print(fileName)}
}

Resources