I created hundreds of data frames in R, and I want to export them to a local position. All the names of the data frames are stored in a vector :
name.vec<-c('df1','df2','df3','df4','df5','df5')
each of which in name.vec is a data frame .
what I want to do is to export those data frames as excel file, but I did not want to do it the way below :
library("xlsx")
write.xlsx(df1,file ="df1.xlsx")
write.xlsx(df2,file ="df2.xlsx")
write.xlsx(df3,file ="df3.xlsx")
because with hundreds of data frames, it's tedious and dangerous.
I want some thing like below instead :
library('xlsx')
for (k in name.vec) {
write.xlsx(k,file=paste0(k,'.xlsx'))
}
but this would not work.
Anyone know how to achieve this? your time and knowledge would be deeply appreciated. Thanks in advance.
The first reason the for loop doesn't work is that the code is attempting to write a single name, 'df1' for example, as the xlsx file contents, instead of the data frame. This is because you're not storing the data frames themselves in the "name.vec" you have. So to fix the for loop, you'd have to do something more like this:
df.list<-list(df1,df2,df3)
name.vec<-c('df1','df2','df3')
library('xlsx')
for (k in 1:length(name.list)){
write.xlsx(df.list[[k]],file=paste0(name.vec[k],'.xlsx'))
}
However, for loops are generally slower than other options. So here's another way:
sapply(1:length(df.list),
function(i) write.xlsx(df.list[[i]],file=paste0(name.vec[i],'.xlsx')))
Output is 3 data frames, taken from the df list, named by the name vector.
It may also be best to at some point switch to the newer package for this: writexl.
Related
I am importing multiple excel workbooks, processing them, and appending them subsequently. I want to create a temporary dataframe (tempfile?) that holds nothing in the beginning, and after each successive workbook processing, append it. How do I create such temporary dataframe in the beginning?
I am coming from Stata and I use tempfile a lot. Is there a counterpart to tempfile from Stata to R?
As #James said you do not need an empty data frame or tempfile, simply add newly processed data frames to the first data frame. Here is an example (based on csv but the logic is the same):
list_of_files <- c('1.csv','2.csv',...)
pre_processor <- function(dataframe){
# do stuff
}
library(dplyr)
dataframe <- pre_processor(read.csv('1.csv')) %>%
rbind(pre_processor(read.csv('2.csv'))) %>%>
...
Now if you have a lot of files or a very complicated pre_processsing then you might have other questions (e.g. how to loop over the list of files or to write the right pre_processing function) but these should be separate and we really need more specifics (example data, code so far, etc.).
I am trying to remove bias from a microscopy analysis, so I want to make it so the experimenter doesn't know what the conditions are for the image they are looking at.
To do this I need to rename every file in a directory so they can't be identified, but I also need to be able to know what the original filename was subsequently.
I made a folder with three files in it to try this out. I got the file list and made a vector for the new names, and combined into a data frame .
setwd("~/Desktop/folder1")
filename_list<-list.files("~/Desktop/folder1")
new_filenames <- c("anon1", "anon2", "anon3")
require(reshape2)
df1 <- melt(data.frame(filename_list,new_filenames))
View(df1)
I've also been able to change names using scripts from a previous question
and r bloggers using sapply and file.rename. I got a little stuck with using wildcards in this to select the whole filename (minus extension) but i'm sure it's possible;
sapply(filename_list,FUN=function(eachPath){file.rename(from=eachPath,to=sub(pattern="image_",replacement="anon",eachPath))})
How I can get the new_filenames vector and apply it to file.rename so it corresponds to the original_filenames vector in the df1 data frame,
or is there a better way to do this? Thanks.
I'm writing a script to plot data from multiple files. Each file is named using the same format, where strings between “.” give some info on what is in the file. For example, SITE.TT.AF.000.52.000.001.002.003.WDSD_30.csv.
These data will be from multiple sites, so SITE, or WDSD_30, or any other string, may be different depending on where the data is from, though it's position in the file name will always indicate a specific feature such as location or measurement.
So far I have each file read into R and saved as a data frame named the same as the file. I'd like to get something like the following to work: if there is a data frame in the global environment that contains WDSD_30, then plot a specific column from that data frame. The column will always have the same name, so I could write plot(WDSD_30$meas), and no matter what site's files were loaded in the global environment, the script would find the WDSD_30 file and plot the meas variable. My goal for this script is to be able to point it to any folder containing files from a particular site, and no matter what the site, the script will be able to read in the data and find files containing the variables I'm interested in plotting.
A colleague suggested I try using strsplit() to break up the file name and extract the element I want to use, then use that to rename the data frame containing that element. I'm stuck on how exactly to do this or whether this is the best approach.
Here's what I have so far:
site.files<- basename(list.files( pattern = ".csv",recursive = TRUE,full.names= FALSE))
sfsplit<- lapply(site.files, function(x) strsplit(x, ".", fixed =T)[[1]])
for (i in 1:length(site.files)) assign(site.files[i],read.csv(site.files[i]))
for (i in 1:length(site.files))
if (sfsplit[[i]][10]==grep("PARQL", sfsplit[[i]][10]))
{assign(data.frame.getting.named.PARQL, sfsplit[[i]][10])}
else if (sfsplit[i][10]==grep("IRBT", sfsplit[[i]][10]))
{assign(data.frame.getting.named.IRBT, sfsplit[[i]][10])
...and so on for each data frame I'd like to eventually plot from.Is this a good approach, or is there some better way? I'm also unclear on how to refer to the objects I made up for this example, data.frame.getting.named.xxxx, without using the entire filename as it was read into R. Is there something like data.frame[1] to generically refer to the 1st data frame in the global environment.
I am using a for loop to read in multiple csv files and naming the datasets import1, import2, etc. For example:
assign(paste("import",i,sep=""), read.csv(files[i], header=FALSE))
However, I now want to rename the variables in each dataset. I have tried the following:
names(as.name(paste("import",i,sep=""))) <- c("xxxx", "yyyy")
But get the error "target of assignment expands to non-language object". (I need to change the name of variables in each dataset within the loop as the variable names need to be different in each dataset).
Any suggestions on how to do this would be much appreciated.
Thanks.
While I do agree it would be much better to keep your data.frames in a list rather than creating a bunch of variables in your global environment, you can also set names when you read the files in
assign(paste("import",i,sep=""),
read.csv(files[i], header=FALSE, col.names=c("xxxx", "yyyy")))
Using assign() isn't very "R-like".
A better approach would be to read the files into a list of data.frames, instead of one data.frame object per file. Assuming files is the vector of file names (as you imply above):
import <- lapply(files, read.csv, header=FALSE)
Then if you want to operate on each data.frame in the list using a loop, you easily can:
for (i in seq_along(import)) names(import[[i]]) <- c('xxx', 'yyy')
I am working with very large data layers for a SDM class and because of this I ended up breaking some of my layers into a bunch of blocks to avoid memory restraint. These blocks were written out as .grd files, and now I need to get them read back into R and merged together. I am extremely new to R an programming in general so any help would be appreciated. What I have been trying so far looks like this:
merge.coarse=raster("coarseBlock1.grd")
for ("" in 2:nBlocks){
merge.coarse=merge(merge.coarse,raster(paste("coarseBlock", ".grd", sep="")))
}
where my files are in coarseBlock.grd and are sequentially numbered from 1 to nBlocks (259)
Any feed back would be greatly appreciated.
Using for loops is generally slow in R. Also, using functions like merge and rbind in a for loop eat up a lot of memory because of the way R passes values to these functions.
A more efficient way to do this task would be to call lapply (see this tutorial on apply functions for details) to load the files into R. This will result in a list which can then be collapsed using the rbind function:
rasters <- lapply(list.files(GRDFolder), FUN = raster)
merge.coarse <- do.call(rbind, rasters)
I'm not too familiar with .grd files, but this overall process should at least get you going in the right direction. Assuming all your .grd files (1 through 259) are stored in the same folder (which I will refer to as GRDFolder), then you can try this:
merge.coarse <- raster("coarseBlock1.grd")
for(filename in list.files(GRDFolder))
{
temp <- raster(filename)
merge.coarse <- rbind(merge.coarse, temp)
}