I use assing in a for loop to batch read in all the .csv files in a working directory. I then use substr to clean the names of the files. I would like to add a column to each of the files with the file name for better analysis later in the code. However, I am having trouble referencing the file in the for loopafter the file names have been cleaned to add a column.
#read in all files in folder
files <- list.files(pattern = "*.csv")
for (i in 1:length(files)){
assign(substr(files[i], start = 11, stop = nchar(files[i])-4), #clean file names
read.csv(files[i], stringsAsFactors = FALSE))
substr(files[i], start = 11, stop = nchar(files[i])-4)['FileFrom'] <- files[i]
}
assign does not seem to be the right function here, I think you need to use eval(parse()) on a string cmd that you set up. The inline notes explain more:
# read in all files in folder
files <- list.files(pattern = "*.csv")
# loop through the files
for (i in 1:length(files)){
# save the clean filename as a char var because it will be called again
fnClean = substr(files[i], start = 1, stop = nchar(files[i])-4)
# create a cmd as a string to be parsed and evaluated on-the-fly
# the point here is that you can use the 'fnClean' var in the string
# without knowing what it is - assign is expecting an inline string
# ...not a string saved as a var, so it can't be used in this case
loadFileCMD = paste0(fnClean,' = read.csv(files[i], stringsAsFactors =
FALSE)')
print(loadFileCMD) # check the cmd
eval(parse(text=loadFileCMD))
# create another string command to be evaluated to insert the file name
# to the 'FileFrom' field
addFnCMD = paste0(fnClean,'$FileFrom = files[i]')
print(addFnCMD) # check the cmd
eval(parse(text=addFnCMD))
}
Would this work?
#read in all files in folder
files <- list.files(pattern = "*.csv")
filesCopy <- files
for (i in 1:length(files)){
assign(substr(files[i], start = 11, stop = nchar(files[i])-4), #clean file names
read.csv(files[i], stringsAsFactors = FALSE))
substr(files[i], start = 11, stop = nchar(files[i])-4)['FileFrom'] <- filesCopy[i]
}
Related
The structure of my directory is as follows:
Extant_Data -> Data -> Raw
-> course_enrollment
-> frpm
I have a few different function to to read in some text files and excel files respectively.
read_fun = function(path){
test = read.delim(path, sep="\t", header=TRUE, fill = TRUE, colClasses = c(rep("character",23)))
test
}
read_fun_frpm= function(path){
test = read_excel(path, sheet = 2, col_names = frpm_names)
}
I feed this into map_dfr so that the function reads in each of the files and rowbinds them.
allfiles = list.files(path = "Extant_Data/Data/Raw/course_enrollment",
pattern = "CourseEnrollment.txt",
full.names=FALSE,
recursive = T)
# Rowbind all the course enrollment data
# !!! BUT I HAVE set the working directory to a subdirectory so that it finds those files
setwd("/Extant_Data/Data/Raw/course_enrollment")
course_combined <- map_dfr(allfiles,read_fun)
allfiles = list.files(path = "Extant_Data/Data/Raw/frpm/post12",
pattern = "frpm*",
full.names=FALSE,
recursive = T)
# Rowbind all the course enrollment data
# !!!I have to change the directory AGAIN
setwd(""Extant_Data/Data/Raw/frpm/post12")
frpm_combined <- map_dfr(allfiles,read_fun_frpm)
As mentioned in the comments, I have to keep changing the working directory so that map_dfr can locate the files. I don't think this is best practice, how might I work around this so I don't have to keep changing the directory? Any suggestions appreciated. Sorry it's hard to provide a re-producible example.
Note: This throws an error.
frpm_combined <- map_dfr(allfiles,read_fun_frpm('Extant_Data/Data/Raw/frpm/post12'))
I've 16 folders with specific person name in Risk/Archive/ folder and I want to copy my excel files (which also contain specific person name) from Risk/ folder to Risk/archive/ folder matching with the folder name
I'm using below code but it's not what i want to accomplish.
f = list.files('Risk/')
d = list.dirs('Risk/Archive')
if (length(f) > 0) {
File = lapply(paste0('Risk/',f), function(i){
x <- read.xlsx(i, sheet = 1, startRow=2, colNames = TRUE, check.names = FALSE, cols = c(1:73))
file.copy(from=i, to='Risk/Archive/',
overwrite = TRUE, recursive = FALSE,copy.mode = TRUE)
x})
File <- do.call("rbind.data.frame", File)}
There might be a better way to do this, but if I understand correctly, I think this should do the trick:
# Get list of names of people
names <- list.dirs(path = "./Risk/Archive",
full.names = F,
recursive = F)
# Get list of files to copy
files <- list.files(path = "./Risk",
pattern = ".xlsx",
full.names = T)
# Loop through each name and move the file for that person
for(name in 1:length(names)){
# Current name in loop
cname <- names[name]
# Get index of file that contains current name
name.idx <- grep(files, pattern = cname)
# Get file path for file that matches current name
file.path <- files[name.idx]
# Make file path for archive folder for current name
name.path <- paste0("./Risk/Archive/", cname)
# Copy file from "Risk" folder to "Archive" folder for current name
file.copy(from = file.path,
to = name.path,
overwrite = T)
# Remove original file after archiving
file.remove(file.path)
# Output message
cat(paste0("Moved file for: ", cname, "\n"))
}
I am creating two dataframes and one graph on Rstudio. I wrote code to transfer them to an Excel file on different sheets, but each time I have to choose the file path using file.choose(). Is it possible to assign the file path to the variable when saving the file for the first time? If such a method exists, how can it be done?
I would also like to receive comments on how to more easily export my dataframes to an excel file. I shared my codes.
Thank you to everyone.
dataframe1 <- data.frame("A"=1, "B"=2)
dataframe2 <- data.frame("C"=3,"D"=4)
list_of_datasets <- list("Name of DataSheet1" = dataframe1, "Name of Datasheet2" = dataframe2, )
write.xlsx(list_of_datasets, file = "writeXLSX2.xlsx")
dflist <- list("Sonuçlar"=yazılacakdosya0, "Frame"=dtf, "Grafik"="")
edc <- write.xlsx(dflist, file.choose(new = T), colNames = TRUE,
borders = "surrounding",
firstRow = T,
headerStyle = hs)
require(ggplot2)
q1 <- qplot(hist(yazılacakdosya0$Puan))
print(q1)
insertPlot(wb=edc, sheet = "Grafik")
saveWorkbook(edc, file = file.choose(), overwrite = T)
Just save the file path before you call saveWorkbook
file = file.choose()
saveWorkbook(edc, file = file, overwrite = T)
I am new to R and I want to batch process all files in a working directory.
I have lots of .txt files and want to read them in, calculate a frequency of one Column, calculate percentage and a so called "H-Score", calculate the sum of the H-Score and store it in a vector. Then the next .txt file should be processed and so on.
After all files are processed, I want to write the vector in another .txt file as a result. The final .txt file should also contain the name of the input file and the calculated sum of H-Score. This is what I have so far, but as you can see, I am a absolute Newbie to programming and R...
setwd("~/Desktop/Automated Analysis/TXT/") # Set working directory
# List all txt files including sub-folders
list_of_files <- list.files(path = ".", recursive = TRUE,
pattern = "\\.txt$", full.names = TRUE)
library(data.table)
# Read all the files and create a FileName column to store filenames
DT <- rbindlist( sapply(list_of_files, fread, simplify = FALSE),
use.names = TRUE, idcol = "FileName" )
br = c(0,1,3,9,15,500) # Set breaks
bins = c(0,1,2,3,4) # Set bins
for (k in 1:length(list_of_files)) { # process all the files in the working directory
HScore_list = c() # create a vector for storing the results
for(i in 1:5) { my_vector = c(HScore_list,i) }
freq = hist(Count, breaks=br, plot=FALSE)
df = data.frame(bins, frequency=freq$counts,
df$percent=df$frequency / sum(df$frequency) * 100,
df$HScore=df$percent * df$bins)
HScore = sum(df$HScore)
}
write(HScore_list, "HScore_list.txt", sep="\n")
Do you know what I want and can help me?
EDIT: My Problem is, that the Code is producing no output.
I have a list of approximately 500 csv files each with a filename that consists of a six-digit number followed by a year (ex. 123456_2015.csv). I would like to append all files together that have the same six-digit number. I tried to implement the code suggested in this question:
Import and rbind multiple csv files with common name in R but I want the appended data to be saved as new csv files in the same directory as the original files are currently saved. I have also tried to implement the below code however the csv files produced from this contain no data.
rm(list=ls())
filenames <- list.files(path = "C:/Users/smithma/Desktop/PM25_test")
NAPS_ID <- gsub('.+?\\([0-9]{5,6}?)\\_.+?$', '\\1', filenames)
Unique_NAPS_ID <- unique(NAPS_ID)
n <- length(Unique_NAPS_ID)
for(j in 1:n){
curr_NAPS_ID <- as.character(Unique_NAPS_ID[j])
NAPS_ID_pattern <- paste(".+?\\_(", curr_NAPS_ID,"+?)\\_.+?$", sep = "" )
NAPS_filenames <- list.files(path = "C:/Users/smithma/Desktop/PM25_test", pattern = NAPS_ID_pattern)
write.csv(do.call("rbind", lapply(NAPS_filenames, read.csv, header = TRUE)),file = paste("C:/Users/smithma/Desktop/PM25_test/MERGED", "MERGED_", Unique_NAPS_ID[j], ".csv", sep = ""), row.names=FALSE)
}
Any help would be greatly appreciated.
Because you're not doing any data manipulation, you don't need to treat the files like tabular data. You only need to copy the file contents.
filenames <- list.files("C:/Users/smithma/Desktop/PM25_test", full.names = TRUE)
NAPS_ID <- substr(basename(filenames), 1, 6)
Unique_NAPS_ID <- unique(NAPS_ID)
for(curr_NAPS_ID in Unique_NAPS_ID){
NAPS_filenames <- filenames[startsWith(basename(filenames), curr_NAPS_ID)]
output_file <- paste0(
"C:/Users/nwerth/Desktop/PM25_test/MERGED_", curr_NAPS_ID, ".csv"
)
for (fname in NAPS_filenames) {
line_text <- readLines(fname)
# Write the header from the first file
if (fname == NAPS_filenames[1]) {
cat(line_text[1], '\n', sep = '', file = output_file)
}
# Append every line in the file except the header
line_text <- line_text[-1]
cat(line_text, file = output_file, sep = '\n', append = TRUE)
}
}
My changes:
list.files(..., full.names = TRUE) is usually the best way to go.
Because the digits appear at the start of the filenames, I suggest substr. It's easier to get an idea of what's going on when skimming the code.
Instead of looping over the indices of a vector, loop over the values. It's more succinct and less likely to cause problems if the vector's empty.
startsWith and endsWith are relatively new functions, and they're great.
You only care about copying lines, so just use readLines to get them in and cat to get them out.
You might consider something like this:
##will take the first 6 characters of each file name
six.digit.filenames <- substr(filenames, 1,6)
path <- "C:/Users/smithma/Desktop/PM25_test/"
unique.numbers <- unique(six.digit.filenames)
for(j in unique.numbers){
sub <- filenames[which(substr(filenames,1,6) == j)]
data.for.output <- c()
for(file in sub){
##now do your stuff with these files including read them in
data <- read.csv(paste0(path,file))
data.for.output <- rbind(data.for.output,data)
}
write.csv(data.for.output,paste0(path,j, '.csv'), row.names = F)
}