I want to use a csv file with old file names and their new file names to rename files in a directory using R. I am quite the amateur in R but I'm hoping someone can help.
I've tried to create two character vectors with the names to compare. I want kind of an if / then statement: "if this file is named x, then rename to y"
FileNames <- read.csv("New_FileNamesCSV.csv")
oldNames <- as.character(FileNames$CurrentFileName)
NewNames <- as.character(FileNames$NewFileName)
setwd("F:/Workspace/ (Local)/FileWork/Files")
file.rename(from = oldNames, to = NewNames)
This is the error I am getting:
Error in file.rename(from = oldNames, to = NewNames) : object
'oldNames' not found
Related
Need your help , I am new to R.
The scenario is have a list of sas datasets in the specfic locations.
path <- 'C:\\XXXX\\XXX'
files <- list.files(path = path,pattern="*.sas7bdat", full.names=FALSE)
the files variable gives the list of files names available in that directory.
i am keeping the file name as the dataframe using split function removing the extensions stored in domain_name variable.
Iterating each filename which his the sas dataset importing and create each dataset name dynamically.(for instance if there are 30 sas datasets, 30 R dataframes should be created.
library(haven)
for (i in 1:length(files)){
domain_name=strsplit(i,split='.sas7bdat', fixed=TRUE)
domain_name <- read_sas(data_file=paste(path,i,sep='/'))
}
could you explain the concept and fix this problem.
Thanks in advance
The following should in principle work. As there is no real example I can only guess.
path <- 'C:/path2file/'
print(path)
files <- list.files(path = path, pattern="*.sas7bdat", full.names=FALSE)
print(files)
mydf <- list()
for (i in 1:length(files)){
filename <- paste0(path, files[i])
print(filename)
# browser() # if you like to step through the file
mydf[[i]] <- haven::read_sas(data_file=filename)
print(names(mydf[[i]]))
eval(parse(text = paste0("mydf_", i, " <- haven::read_sas(data_file=filename)")))
}
Then you can access each data.frame via e.g. df1 <- mydf[[1]]
I have a directory (dir2 in the code below) with 200 subdirectory folders, each of which contains a .txt data file and a Setup.sas file. I need to write a for loop that uses the asciiSetupReader package to loop over each subdirectory, read the files therein using the sas_ascii_reader function and row-bind all the resulting read objects to one data frame. I know this must be relatively simple but am having difficulties.
I have generated a dataframe that has two columns: one of the list of file names of the .txt files and another of the list of accompanying Setup.sas files.
list_file_txt <- list.files(path = './dir1/dir2',
pattern='*Data.txt',
recursive=TRUE)
list_file_sas <- list.files(path = './dir1/dir2',
pattern='*Setup.sas',
recursive=TRUE)
files <- as.data.frame(cbind(list_file_txt,list_file_sas))
files <- files %>%
mutate(directory = str_sub(list_file_txt,1,7),
directory = paste0('/dir1/dir2/',directory))
I have attempted:
for (i in 1:nrow(files)) {
setwd(files$directory)
sas_ascii_reader(dataset_name = '*Data.txt',
sas_name = '*Setup.sas',
real_names = FALSE)
}
Results in error,
Error in setwd(files$directory) : cannot change working directory
which I understand indicates that R is not recognizing the character strings in the files$directory column as file paths to reference.
I have also tried (as referenced at How to import files from subdirectories and name them with subdirectory name R)
library(tidyverse)
tbl <-
list.files(path = './dir1/dir2',
recursive=TRUE) %>%
map_dfr(sas_ascii_reader,
dataset_name = '*Data.txt',
sas_name = '*Setup.sas',
.id = "filepath")
but get
Error in .f(.x[[i]], ...) : is.logical(value_label_fix) is not TRUE
which I don't understand at all.
Any help would be appreciated. Thanks all.
I have a folder (folder 1) containing multiple csv: "x.csv", "y.csv", "z.csv"...
I want to extract the 3rd column of each file and then write new csv files in a new folder (folder 2). Hence, folder 2 must contain "x.csv", "y.csv", "z.csv"...(but with just the 3rd column).
I tried this:
dfiles <- list.files(pattern =".csv") #if you want to read all the files in working directory
lst2 <- lapply(dfiles, function(x) (read.csv(x, header=FALSE)[,3]))
But I got this error:
Error in `[.data.frame`(read.csv(x, header = FALSE), , 3) :
undefined columns selected
Moreover, I don't know how to write multiple csv.
However, if I do this with one file, it works properly, despite the output is in the same folder:
essai <-read.csv("x.csv", header = FALSE, sep = ",")[,3]
write.csv (essai, file = "x.csv")
Any help would be appreciated.
so here's how I would do it. There may be a nicer and more efficient way but it should still work pretty well.
setwd("~/stackexchange") #set your main folder. Best way to do this is actually the here() package. But that's another topic.
library(tools) #for file extension tinkering
folder1 <- "folder1" #your original folder
folder2 <- "folder2" #your new folder
#I setup a function and loop over it with lapply.
write_to <- function(file.name){
file.name <- paste0(tools::file_path_sans_ext(basename(file.name)), ".csv")
essai <-read.csv(paste(folder1, file.name, sep = "/"), header = FALSE, sep = ",")[,3]
write.csv(essai, file = paste(folder2, file.name, sep="/"))
}
# get file names from folder 1
dfiles <- list.files(path=folder1, pattern ="*.csv") #if you want to read all the csv files in folder1 directory
lapply(X = paste(folder1, dfiles, sep="/"), write_to)
Have fun!
Btw: if you have many files, you could use data.table::fread and data.table::fwrite which improves csv reading/writing speed by a lot.
First of all, from the error message it seems that some of the csv files have less than 3 columns. Check if you are reading the correct files and if all of them are supposed to have 3 columns at least.
Once you do that you can use the below code, to read the csv file, select the 3rd column and write the csv file in 'folder2'.
lapply(dfiles, function(x) {
df <- read.csv(x, header = FALSE)
write.csv(subset(df, select = 3), paste0('folder2/', x), row.names = FALSE)
})
For the "write" portion of this question, I had some luck using map2() in purrr. I'm not sure this is the most elegant solution but here it goes:
listofessais # this is your .csv files together as a named list of tbls
map2(listofessais, names(listofessais), ~write_csv(.x, glue("FilePath/{.y}.csv"))
That should give you all your .csv files exported in that folder, and named with the same names they were given in the list.
I am pretty new to R and programming so I do apologies if this question has been asked elsewhere.
I'm trying to load multiple .csv files, edit them and save again. But cannot find out how to manage more than one .csv file and also name new files based on a list of character strings.
So I have .csv file and can do:
species_name<-'ace_neg'
{species<-read.csv('species_data/ace_neg.csv')
species_1_2<-species[,1:2]
species_1_2$species<-species_name
species_3_2_1<-species_1_2[,c(3,1,2)]
write.csv(species_3_2_1, file='ace_neg.csv',row.names=FALSE)}
But I would like to run this code for all .csv files in the folder and add text to a new column based on .csv file name.
So I can load all .csv files and make a list of character strings for use as a new column text and as new file names.
NDOP_files <- list.files(path="species_data", pattern="*.csv$", full.names=TRUE, recursive=FALSE)
short_names<- substr(NDOP_files, 14,20)
Then I tried:
lapply(NDOP_files, function(x){
species<-read.csv(x)
species_1_2<-species[,1:2]
species_1_2$species<-'name' #don't know how to insert first character string of short_names instead of 'name', than second character string from short_names for second csv. file etc.
Then continue in the code to change an order of columns
species_3_2_1<-species_1_2[,c(3,1,2)]
And then write all new modified csv. files and name them again by the list of short_names.
I'm sorry if the text is somewhat confusing.
Any help or suggestions would be great.
You are actually quite close and using lapply() is really good idea.
As you state, the issue is, it only takes one list as an argument,
but you want to work with two. mapply() is a function in base R that you can feed multiple lists into and cycle through synchronically. lapply() and mapply()are both designed to create/ manipulate objects inRbut you want to write the files and are not interested in the out withinR. Thepurrrpackage has thewalk*()\ functions which are useful,
when you want to cycle through lists and are only interested in creating
side effects (in your case saving files).
purrr::walk2() takes two lists, so you can provide the data and the
file names at the same time.
library(purrr)
First I create some example data (I’m basically already using the same concept here as I will below):
test_data <- map(1:5, ~ data.frame(
a = sample(1:5, 3),
b = sample(1:5, 3),
c = sample(1:5, 3)
))
walk2(test_data,
paste0("species_data/", 1:5, "test.csv"),
~ write.csv(.x, .y))
Instead of getting the file paths and then stripping away the path
to get the file names, I just call list.files(), once with full.names = TRUE and once with full.names = FALSE.
NDOP_filepaths <-
list.files(
path = "species_data",
pattern = "*.csv$",
full.names = TRUE,
recursive = FALSE
)
NDOP_filenames <-
list.files(
path = "species_data",
pattern = "*.csv$",
full.names = FALSE,
recursive = FALSE
)
Now I feed the two lists into purrr::walk2(). Using the ~ before
the curly brackets I can define the anonymous function a bit more elegant
and then use .x, and .y to refer to the entries of the first and the
second list.
walk2(NDOP_filepaths,
NDOP_filenames,
~ {
species <- read.csv(.x)
species <- species[, 1:2]
species$species <- gsub(".csv", "", .y)
write.csv(species, .x)
})
Learn more about purrr at purrr.tidyverse.org.
Alternatively, you could just extract the file name in the loop and stick to lapply() or use purrr::map()/purrr::walk(), like this:
lapply(NDOP_filepaths,
function(x) {
species <- read.csv(x)
species <- species[, 1:2]
species$species <- gsub("species///|.csv", "", x)
write.csv(species, gsub("species///", "", x))
})
NDOP_files <- list.files(path="species_data", pattern="*.csv$",
full.names=TRUE, recursive=FALSE)
# Get name of each file (without the extension)
# basename() removes all of the path up to and including the last path seperator
# file_path_sands_ext() removes the .csv extension
csvFileNames <- tools::file_path_sans_ext(basename(NDOP_files))
Then, I would write a function that takes in 1 csv file and does some manipulation to the file and outputs out a data frame. Since you have a list of csv files from using list.files, you can use the map function in the purrr package to apply your function to each csv file.
doSomething <- function(NDOP_file){
# your code here to manipulate NDOP_file to your liking
return(NDOP_file)
NDOP_files <- map(NDOP_files, ~doSomething(.x))
Lastly, you can manipulate the file names when you write the new csv files using csvFileNames and a custom function you write to change the file name to your liking. Essentially, use the same architecture of defining your custom function and using map to apply to each of your files.
I need to create a function called PollutantMean with the following arguments: directory, pollutant, and id=1:332)
I have most of the code written but I can't figure out how to assign my directory as a variable. My current working directory is C:/Users/User/Documents. I tried writing the variable as:
directory <- "C:/Users/User/specdata" and that didn't work.
Next I tried the following:
directory <- list.files("specdata", full.names=TRUE) and that didn't work either.
Any ideas on how to change this?
If you are trying to assign the values in your current working directory to the variable "directory" Why not take the simple method and add:
directory <- getwd()
This should take the contents of the working directory and assign the values to the variable "directory".
I've already worker with directory as variables, I usually declare them like that
directory<-"C://Users//User//specdata//"
To take back your example.
Then, if I want to read a specific file in this directory, I will just go like :
read.table(paste(directory,"myfile.txt",sep=""),...)
It's the same process to write in a file
write.table(res,file=paste(directory,"myfile.txt",sep=""),...)
Is this helping ?
EDIT : you can then use read.csv and it will work fine
I think you are confused by the assignment operation in R. The following line
directory <- "C:/Users/User/specdata"
assigns a string to a new object that just happened to be called directory. It has the same effect on your working environment as
elephant <- "C:/Users/User/specdata"
To change where R reads its files, use the function setwd (short for set working directory):
setwd("C:/Users/User/specdata")
You can also specify full path names to functions that read in data (like read.table). For your specific problem,
# creates a list of all files ending with `csv` (i.e. all csv files)
all.specdata.files <- list.files(path = "C:/Users/User/specdata", pattern = "csv$")
# creates a list resulting from the application of `read.csv` to
# each of these files (which may be slow!!)
all.specdata.list <- lapply(all.specdata.files, read.csv)
Then we use dplyr::rbind_all to row-bind them into one file.
library(dplyr)
all.specdata <- rbind_all(all.specdata.list)
Then use colMeans to determine the grand means. Not sure how to do this without seeing the data.
Assuming that the columns in each of the 300+ csv files are the same, that is have column j contains the same type of data in all files, then the following example should be of use:
# let's use a temp directory for storing the files
tmpdr <- tempdir()
# Let's creat a large matrix of values and then split it into many different
# files
original_data <- data.frame(matrix(rnorm(10000L), nrow = 1000L))
# write each row to a file
for(i in seq(1, nrow(original_data), by = 1)) {
write.csv(original_data[i, ],
file = paste0(tmpdr, "/", formatC(i, format = "d", width = 4, flag = 0), ".csv"),
row.names = FALSE)
}
# get a character vector with the full path of each of the files
files <- list.files(path = tmpdr, pattern = "\\.csv$", full.names = TRUE)
# read each file into a list
read_data <- lapply(files, read.csv)
# bind the read_data into one data.frame,
read_data <- do.call(rbind, read_data)
# check that our two data.frames are the same.
all.equal(read_data, original_data)
# [1] TRUE