renaming files based on names of other files in R - r

Right now I have two folders, each with the same number .dat files. In folder 1, I have my original data files and in folder 2 I have files where I have deleted some data from folder 1 files. I'm trying to see how missing data can change my average, so I am randomly deleting some data then performing the same statistics to see how much it differs from the original data set So in folder one I have files that have names
kn_2014_01_09_0600.dat
kn_2014_01_09_0700.dat
kn_2014_01_09_0800.dat
and so on
After I read them in to R and work them in the way I need I write the files to a new folder where they now have the names
1.dat
2.dat
3.dat
and so on
How can I change the names in my output folder to match the original names?
So my output file 1.dat should be kn_2014_01_09_0600.dat and my 2.dat file should be kn_2014_01_09_0700.dat and so on
right now I have
nfiles <- list.files('new file location')
ofiles <- list.files('original file location')
lapply(nfiles,function(i){file.rename(from=i,to= )})
I don't know what to put for the to argument? I thought it might be something like
lapply(nfiles,function(i){file.rename(from=i,to=ofiles[i])})
or
lapply(nfiles,function(i){file.rename(from=i,to=ofiles[1:length(ofiles)})
but neither of those worked. Any suggestions?

Related

How to select specific files according to a spreadsheet criteria and then copy from directory to another directory in R?

I have a task that requires me to use a specific column in a CSV spreadsheet that stores the file names, for example:
File Name
CA-001
WV-001
ma-001
My task is to move some files from folder 'source' to folder 'target'.
And I'm using this csv spreadsheet as a crosswalk to select any files with names that match with what's in the column 'File Name'. Then I'm asking R to copy from the source folder that contains not only these files but also other files that are not in this list(eg: CO-001, SC-001...). If it's helpful, all of the files are PDFs, so we don't worry about file type. I want only the files that have names match with what's in the csv spreadsheet. How can I do this?
I have some sample code below, but it still didn't execute successfully.
source <- "C:/Users/53038/MovePDF/Test_From"
target <- "C:/Users/53038/MovePDF/Test_To"
all.files <- list.files(path = source)
csvfile <- read.csv('C:/Users/53038/MovePDF/Master.csv')
toCopy <- all.files[all.files %in% csvfile$Move]
file.copy(toCopy, target)
Thank you!
With the provided code, the selection of patterns you want to match will be in csvfile$File.Name.
I'm assuming the source directory is potentially very large. Instead of performing slow regular expressions to match substrings (while we know the exact filename), and/or getting a complete file listing (which is also slow), I will only seek if the exactly wanted filenames exist before copying them:
source <- "C:/Users/53038/MovePDF/Test_From"
target <- "C:/Users/53038/MovePDF/Test_To"
csvfile <- read.csv('C:/Users/53038/MovePDF/Master.csv')
# add .pdf suffix
toCopy <- paste0(csvfile$File.Name,'.pdf')
# add source directory path
toCopy <- file.path(source, toCopy)
# optional: extract only the existing files from toCopy. You can skip this step if you're sure they exist and/or you don't mind receiving errors
toCopy <- toCopy[file.exists(toCopy)]
# make it so
file.copy(toCopy, target, overwrite = T)
I would preferably keep the .pdf extension in the filename at all times, so also in the source CSV. There would be an issue on case-sensitive filesystems (almost all Linux installations, rarely macOS or Windows) if the extension is .PDF, .Pdf, etc.

How to set multiple file patterns in R when copying files from one folder to another?

At work I have files that are added to a folder path as they are received and accepted. They are .wdf files that I need to convert from wdf to csv. I then want to connect the files into a single file that can be filtered by a column name. So I'm trying to pull a subset of the files from numerous folders based on extension and date, copy those that I want into another folder, and then I will connect those.
File names that I want to pull are in the form of:
"//xyz/ExternalUsers/em/em18thjudic/uploaded_files/ACCEPTED_201907101310_UIXD#FGE18thJULDWC2Q2019.wdf"
I want all files in that path that end in .wdf and fall between a certain date (currently the month of July). I would also prefer to be able to have it pull only new files when I run the script but I haven't figured that out yet. I can get it to pull files with either the date or the file type but not both.
I have tried using tapply with file.mtime to pull by date. This didn't work so I tried to pull by files that contain certain upload dates in name.
files <- list.files(
path="//sptw02/ExternalUsers/em",
pattern = "\\.wdf$|._201907.",
full.names = TRUE,
recursive = TRUE)
dirs <- dirname(files)
lastfiles <- tapply(files, dirs, function(v) v[which.max(file.mtime(v))])
what I've tried:
1) pattern = "\\.wdf$|._201907.",
2) pattern = c("(\\.wdf$,._201907.)"),
3) pattern = "\\.wdf"|"._201707.",
I can only get it to pull either files containing that date in the name or the .wdf file extension
I expect to grab only the files that contain patterns I'm filtering for and to be copied into another folder. Instead it is copying only all files that have .wdf or _201907. I can not get both pulled. It is pulling in everything when it copies.

How can I write multiple csv files in a specific directory and then merge them into a single csv?

I am trying to deal with extracting a subset from multiple .grb2 files in the same file path, and write them in separate csv files. I am using the following code which does the job and stores the csv files in the same directory as the .grb2 files.
path <- "file path"
input.file.names <- dir(path, pattern =".grb2")
output.file.names <-
paste0(tools::file_path_sans_ext(input.file.names),".csv")
for(i in 1:length(input.file.names)){
GRIB <- brick(input.file.names[i])
GRIB <- as.array(GRIB)
tmp2m.6hr <- GRIB[46,13,c(1:20)]
str(tmp2m.6hr)
tmp2m.data <- data.frame(tmp2m.6hr)
write.csv(tmp2m.data,output.file.names[i])
}
My first question is this: how can I store the csv files in a different directory than the .grb2 files?
My .grb2 files, and thus the resulting csv files, end in four different types, i.e. 00.grb2, 06.grb2, 12.grb2, 18.grb2. The resulting csv files have the following form:
enter image description here
My second question is: how can I merge all my 00.csv, 06.csv, 12.csv, 18.csv files (each category in the same column) in a single csv file in a directory of my choice with the following headrs: 00_tmp2m.6hr, 06_tmp2m.6hr, 12_tmp2m.6hr, 18_tmp2m.6hr, and also create a fifth column with the average of the other four? The result that I want is the following:
enter image description here
As I m not an experienced user this is too complicated for me. I would very much apreciate any assistance with this.
For your fist question, you might try specifying the path using a relative reference to the folder, as in write.csv(paste0("./myfolder/", output.file.names[i])).
Your second question might be easier if you read the data and then write your results as a new file. you might also want to take a look at the optional parameters of write.csv(append = FALSE, ...).
Also, you might get a better answer by creating a minimal example.

To stack up results in one masterfile in R

Using this script I have created a specific folder for each csv file and then saved all my further analysis results in this folder. The name of the folder and csv file are same. The csv files are stored in the main/master directory.
Now, I have created a csv file in each of these folders which contains a list of all the fitted values.
I would now like to do the following:
Set the working directory to the particular filename
Read fitted values file
Add a row/column stating the name of the site/ unique ID
Add it to the masterfile which is stored in the main directory with a title specifying site name/filename. It can be stacked by rows or by columns it doesn't really matter.
Come to the main directory to pick the next file
Repeat the loop
Using the merge(), rbind(), cbind() combines all the data under one column name. I want to keep all the sites separate for comparison at a later on stage.
This is what I'm using at the moment and I'm lost on how to proceed further.
setwd( "path") # main directory
path <-"path" # need this for convenience while switching back to main directory
# import all files and create a character type array
files <- list.files(path=path, pattern="*.csv")
for(i in seq(1, length(files), by = 1)){
fileName <- read.csv(files[i]) # repeat to set the required working directory
base <- strsplit(files[i], ".csv")[[1]] # getting the filename
setwd(file.path(path, base)) # setting the working directory to the same filename
master <- read.csv(paste(base,"_fiited_values curve.csv"))
# read the fitted value csv file for the site and store it in a list
}
I want to construct a for loop to make one master file with the files in different directories. I do not want to merge all under one column name.
For example, If I have 50 similar csv files and each had two columns of data, I would like to have one csv file which accommodates all of it; but in its original format rather than appending to the existing row/column. So then I will have 100 columns of data.
Please tell me what further information can I provide?
for reading a group of files, from a number of different directories, with pathnames patha pathb pathc:
paths = c('patha','pathb','pathc')
files = unlist(sapply(paths, function(path) list.files(path,pattern = "*.csv", full.names = TRUE)))
listContainingAllFiles = lapply(files, read.csv)
If you want to be really quick about it, you can grab fread from data.table:
library(data.table)
listContainingAllFiles = lapply(files, fread)
Either way this will give you a list of all objects, kept separate. If you want to join them together vertically/horizontally, then:
do.call(rbind, listContainingAllFiles)
do.call(cbind, listContainingAllFiles)
EDIT: NOTE, the latter makes no sense unless your rows actually mean something when they're corresponding. It makes far more sense to just create a field tracking what location the data is from.
if you want to include the names of the files as the method of determining sample location (I don't see where you're getting this info from in your example), then you want to do this as you read in the files, so:
listContainingAllFiles = lapply(files,
function(file) data.frame(filename = file,
read.csv(file)))
then later you can split that column to get your details (Assuming of course you have a standard naming convention)

How to use R to Iterate through Subfolders and bind CSV files of the same ID?

I am stuck. I need a way to iterate through a bunch of subfolders in a directory, pull out 4 .csv files , bind the contents of those 4 .csv files, then write out the new .csv to a new directory using the name of the initial subfolder as the name of the new .csv.
I know R could do this. But I am stuck at how to iterate across the subfolders and bind the csv files together. My obstacle is that each subfolder contains the same 4 .csv files using the same 8-digit id. For example, subfolder A contains 09061234.csv, 09061345.csv, 09061456.csv, and 09061560.csv. subfolder B contains 9061234.csv, 09061345.csv, 09061456.csv, and 09061560.csv. (...). There are 42 subfolders, and hence 168 csv files with the same names. I want to compact the files down to 42.
I can use list.files to retrieve all the subfolders. But then what?
##Get Files from directory
TF = "H:/working/TC/TMS/Counts/June09"
##List Sub folders
SF <- list.files(TF)
##List of File names inside folders
FN <- list.files(SF)
#Returns list of 168 filenames
###?????###
#How to iterate through each subfolder, read each 8-digit integer id file,
#bind them all together into one single csv,
#Then write to new directory using
#the name of the subfolder as the name of the new csv?
There is probably a way to do this easily but I am a noob with R. Something involving functions, paste and write.table perhaps? Any hints/help/suggestions is greatly appreciated. Thanks!
You can use recursive=T option for list.files,
lapply(c('1234' ,'1345','1456','1560'),function(x){
sources.files <- list.files(path=TF,
recursive=T,
pattern=paste('*09061*',x,'*.csv',sep='')
,full.names=T)
## ou read all files with the id and bind them
dat <- do.call(rbind,lapply(sources.files,read.csv))
### write the file for the
write(dat,paste('agg',x,'.csv',sep='')
}
After some tweaking of agstudy's code, I came up with the solution I was ultimately after. There were a couple of missing pieces that are more due to the nature of my specific problem, so I am leaving agstudy's answer as "accepted".
Turns out a function really wasn't needed. At least not for now. If I need to perform this same task again, I will create a function out of it. For now, I can solve this particular problem without it.
Also, for my instance, I needed a conditional "if" statement to handle any non-csv files that may have lived in the subfolders. By adding an if statement, R throws warnings and skips any files that are not comma-separated.
Code:
##Define directory path##
TF = "H:/working/TC/TMS/Counts/June09"
##List of subfolder files where file name starts with "0906"##
SF <- list.files(TF,recursive=T, pattern=paste("*09061*",x,'*.csv',sep=""))
##Define the list of files to search for##
x <- (c('1234' ,'1345','1456','1560')
##Create a conditional to skip over the non-csv files in each folder##
if (is.integer(x)){
sources.files <- list.files(TF, recursive=T,full.names=T)}
dat <- do.call(rbind,lapply(sources.files,read.csv))
#the warnings thrown are ok--these are generated due to the fact that some of the folders contain .xls files
write.table(dat,file="H:/working/TC/TMS/June09Output/June09Batched.csv",row.names=FALSE,sep=",")

Resources