I am trying to write multiple dataframes to multiple .csv files dynmanically. I have found online how to do the latter part, but not the former (dynamically define the dataframe).
# create separate dataframes from each 12 month interval of closed age
for (i in 1:max_age) {assign(paste("closed",i*12,sep=""),
mc_masterc[mc_masterc[,7]==i*12,])
write.csv(paste("closed",i*12,sep=""),paste("closed",i*12,".csv",sep=""),
row.names=FALSE)
}
In the code above, the problem is with the first part of the write.csv statement. It will create the .csv file dynamically, but not with the actual content from the table I am trying to specify. What should the first argument of the write.csv statement be? Thank you.
The first argument of write.csv needs to be an R object, not a string. If you don't need the objects in memory you can do it like so:
for (i in 1:max_age) {
df <- mc_masterc[mc_masterc[,7]==i*12,])
write.csv(df,paste("closed",i*12,".csv",sep=""),
row.names=FALSE)
}
and if you need them in memory, you can either do that separately, or use get to return an object based on a string. Seperate:
for (i in 1:max_age) {
df <- mc_masterc[mc_masterc[,7]==i*12,])
assign(paste("closed",i*12,sep=""),df)
write.csv(df,paste("closed",i*12,".csv",sep=""),
row.names=FALSE)
}
With get:
for (i in 1:max_age) {
assign(paste("closed",i*12,sep=""), mc_masterc[mc_masterc[,7]==i*12,])
write.csv(get(paste("closed",i*12,sep="")),paste("closed",i*12,".csv",sep=""),
row.names=FALSE)
}
Related
I want to manipulate different .csv files through a loop and a list. Works fine, but I for the output, I have to create many .xlsx files and the files have to be named according to the value of a certain variable.
I've already tried piping the write_xlsx function with ifelse condition like:
for (i in 1:length(files)) {
files[[i]] %>%
write_xlsx(files[[i]], paste(ifelse(x="test1", "/Reportings/test1.xlsx",
ifelse(x="test2", "/Reportings/test2.xlsx", "test3")
}
I expect that multiple .xlsx files will be created in the folder Reportings.
Not easy to answer precisely with the information you gave, but here is a minimal example that seems to do what you want :
According that your list is composed of matrix, that x is a variable and that it always has the same value.
df=data.frame(x=rep("test1",3),y=rep("test1",3))
df2=data.frame(x=rep("test2",3),y=rep("test2",3))
files=list(df,df2)
files[[1]]$x[1]
for(i in 1:length(files)){
write.xlsx(files[[i]],paste0("Reportings/",files[[i]]$x[1],".xlsx"))
}
I am using the purrr:walk to read multiple excel files and it failed. I have 3 questions:
(1) I used the function list.files to read the excel file list in one folder. But the returned values also included the subfolders. I tried set value for the parameters recursive= and include.dirs=, but it didn't work.
setwd(file_path)
files<-as_tibble(list.files(file_path,recursive=F,include.dirs=F)) %>%
filter(str_detect(value,".xlsx"))
files
(2) When I used the following piece of code, it can run without any error or warning message, but there is no returned data.
###read the excel data
file_read <- function(value1) {
print(value1)
file1<-read_excel(value1,sheet=1)
}
walk(files$value,file_read)
When I used the following, it worked. Not sure why.
test<-read_excel(files$value,sheet=1)
(3) In Q2, actually I want to create file1 to file6, suppose there are 6 excel files. How can I dynamically assign the dataset name?
list.files has pattern argument where you can specify what kind of files you are looking for. This will help you avoid filter(str_detect(value,".xlsx")) step. Also list.files only returns the files that are included in the main directory (file_path) and not it's subdirectory unless you specify recursive = TRUE.
library(readxl)
setwd(file_path)
files <- list.files(pattern = '\\.xlsx')
In the function you need to return the object.
file_read <- function(value1) {
data <- read_excel(value1,sheet=1)
return(data)
}
Now you can use map/lapply to read the files.
result <- purrr::map(files,file_read)
I am trying to write a program to open a large amount of files and run them through a function I made called "sort". Every one of my file names starts with "sa1", however after that the characters vary based on the file. I was hoping to do something along the lines of this:
for(x in c("Put","Characters","which","Vary","by","File","here")){
sa1+x <- read.csv("filepath/sa1+x",header= FALSE)
sa1+x=sort(sa1+x)
return(sa1+x)
}
In this case, say that x was 88. It would open the file sa188, name that dataframe sa188, and then run it through the function sort. I dont think that writing sa1+x is the correct way to bind together two values, but I dont know a way to.
You need to use a list to contain the data in each csv file, and loop over the filenames using paste0.
file_suffixes <- c("put","characters","which","vary","by","file","here")
numfiles <- length(file_suffixes)
list_data <- list()
sorted_data <- list()
filename <- "filepath/sa1"
for (x in 1:numfiles) {
list_data[[x]] <- read.csv(paste0(filename, file_suffixes[x]), header=FALSE)
sorted_data[[x]] <- sort(list_data[[x]])
}
I am not sure why you use return in that loop. If you're writing a function, you should be returning the sorted_data list which contains all your post-sorting data.
Note: you shouldn't call your function sort because there is already a base R function called sort.
Additional note: you can use dir() and regex parsing to find all the files which start with "sa1" and loop over all of them, thus freeing you from having to specify the file_suffixes.
I tried to merge different tab delim files into single file using the following R command.
If you observe, I even save the file using write.table command. Now i need to read the same files for further analysis. The biggest problem I am facing is that there is an extra column without any column name created automatically. If you observe that there is a column (Red colour) created automatically when I use the write.table function.
I want to get rid of that column as it hampers all further calculations.
combine=function(file) {
split_list <- unlist(strsplit(file,split=","))
setwd("D:/combine")
dataset <- do.call("cbind",lapply(split_list,FUN=function(files) { read.table(files,header=TRUE, sep="\t") } ) )
names(dataset)[1]=paste("Probe_ID")
drop=c("ProbeID")
dataset=dataset[,!(names(dataset)%in%drop)]
dataset$X=NULL
write.table(dataset,file="D:/output/illumina.txt",sep="\t",col.names=NA)
return ("illumina.txt")
}
Use the argument row.names=FALSE in write.table.
As #James says -- or use row.names=1 in read.table() to indicate that the first column designates the row identifiers of the table when reading the table back into R.
My situation:
I have a number of csv files all with the same suffix pre .csv, but the first two characters of the file name are different (ie AA01.csv, AB01.csv, AC01.csv etc)
I have an R script which I would like to run on each file. This file essentially extracts the data from the .csv and assigns them to vectors / converts them into timeseries objects. (For example, AA01 xts timeseries object, AB01 xts object)
What I would like to achieve:
Embed the script within a larger loop (or as appropriate) to sequentially run over each file and apply the script
Remove the intermediate objects created (see code snippet below)
Leave me with the final xts objects created from each raw data file (ie AA01 to AC01 etc as Values / Vectors etc)
What would be the right way to embed this script in R? Sorry, but I am a programming noob!
My script code below...heading of each column in each CSV is DATE, TIME, VALUE
# Pull in Data from the FileSystem and attach it
AA01raw<-read.csv("AA01.csv")
attach(AA01raw)
#format the data for timeseries work
cdt<-as.character(Date)
ctm<-as.character(Time)
tfrm<-timeDate(paste(cdt,ctm),format ="%Y/%m/%d %H:%M:%S")
val<-as.matrix(Value)
aa01tsobj<-timeSeries(val,tfrm)
#convert the timeSeries object to an xts Object
aa01xtsobj<-as.xts(tsobj)
#remove all the intermediate objects to leave the final xts object
rm(cdt)
rm(ctm)
rm(aa01tsobj)
rm(tfrm)
gc()
and then repeat on each .csv file til all xts objects are extracted.
ie, what we would end up within R, ready for further applications are:
aa01xtsobj, ab01xtsobj, ac01xtsobj....etc
any help on how to do this would be very much appreciated.
Be sure to use Rs dir command to produce the list of filenames instead of manually entering them in.
filenames = dir(pattern="*01.csv")
for( i in 1:length(filenames) )
{
...
I find a for loop and lists is well enough for stuff like this. Once you have a working set of code it's easy enough to move from a loop into a function which can be sapplyied or similar, but that kind of vectorization is idiosyncratic anyway and probably not useful outside of private one-liners.
You probably want to avoid assigning to multiple objects with different names in the workspace (this a FAQ which usually comes up as "how do I assign() . . .").
Please beware my untested code.
A vector of file names, and a list with a named element for each file.
files <- c("AA01.csv", "AA02.csv")
lst <- vector("list", length(files))
names(lst) <- files
Loop over each file.
library(timeSeries)
for (i in 1:length(files)) {
## read strings as character
tmp <- read.csv(files[i], stringsAsFactors = FALSE)
## convert to 'timeDate'
tmp$tfrm <- timeDate(paste(tmp$cdt, tmp$ctm),format ="%Y/%m/%d %H:%M:%S"))
## create timeSeries object
obj <- timeSeries(as.matrix(tmp$Value), tmp$tfrm)
## store object in the list, by name
lst[[files[i]]] <- as.xts(obj)
}
## clean up
rm(tmp, files, obj)
Now all the read objects are in lst, but you'll want to test that the file is available, that it was read correctly, and you may want to modify the names to be more sensible than just the file name.
Print out the first object by name index from the list:
lst[[files[1]]]