I'm trying to save a data frame after every iteration of this loop, while appending the data frame with the loop number. So, I'll be left with 5 data frames all with different names.
In my actual code, all the data frames will be different but for simplicity I've just shown one data frame here.
I've supplied some test code below.
testFunction <- function() {
for (i in 1:5) {
x <- data.frame(c(1:10), c(1,2,3,4,5,6,7,8,9,10), c(10:19))
name <- paste("name", i, sep = "_")
name <- x
}
}
The example data frames created would be named:
testFunction()
name_1
name_2
name_3
name_4
name_5
However, I'm only getting the final data frame "name_5" to save when the loop completes. My issue is I don't know how to save the ith version of the data frame without escaping from the loop.
Any suggestions on how I can solve this?
***** EDIT *****
I have my for loop inside a function, which might be why assign() is not working. I've appended my example above to show this.
Inside your loop, use assign():
for (i in 1:5) {
x <- data.frame(c(1:10), c(1,2,3,4,5,6,7,8,9,10), c(10:19))
assign( paste("name", i, sep = "_") , x)
}
Edit:
As you now want to do this in a function, you would have to specify the environment to assign to. I suspect you want the global environment:
testFunction <- function() {
for (i in 1:5) {
x <- data.frame(c(1:10), c(1,2,3,4,5,6,7,8,9,10), c(10:19))
assign( paste("name", i, sep = "_") , x , envir = globalenv() )
}
}
Please be warned that it is not good practice to write a function that edits the enclosing environment. You'd be better off just returning a named list of your data frames, e.g. like so:
testFunction_2 <- function() {
out_list <- vector(mode = "list", length = 5)
for (i in 1:5) {
x <- data.frame(c(1:10), c(1,2,3,4,5,6,7,8,9,10), c(10:19))
out_list[[i]] <- x
names(out_list)[i] <- paste("name", i, sep = "_")
}
return(out_list)
}
Related
I have a bunch of csv files that I'm trying to read into R all at once, with each data frame from a csv becoming an element of a list. The loops largely work, but they keep overriding the list elements. So, for example, if I loop over the first 2 files, both data frames in list[[1]] and list[[2]] will contain the data frame for the second file.
#function to open one group of files named with "cores"
open_csv_core<- function(year, orgtype){
file<- paste(year, "/coreco.core", year, orgtype, ".csv", sep = "")
df <- read.csv(file)
names(df) <- tolower(names(df))
df <- df[df$ntee1 %in% c("C","D"),]
df<- df[!(df$nteecc %in% c("D20","D40", "D50", "D60", "D61")),]
return(df)
}
#function to open one group of files named with "nccs"
open_csv_nccs<- function(year, orgtype){
file2<- paste(year, "/nccs.core", year, orgtype, ".csv", sep="")
df2 <- read.csv(file2)
names(df2) <- tolower(names(df2))
df2 <- df2[df2$ntee1 %in% c("C","D"),]
df2<- df2[!(df2$nteecc %in% c("D20","D40", "D50", "D60", "D61")),]
return(df2)
}
#############################################################################
yrpc<- list()
yrpf<- list()
yrco<- list()
fname<- vector()
file_yrs<- as.character(c(1989:2019))
for(i in 1:length(file_yrs)){
fname<- list.files(path = file_yrs[i], pattern = NULL)
#accessing files in a folder and assigning to the proper function to open them based on how the file is named
for(j in 1:length(fname)){
if(grepl("pc.csv", fname[j])==T) {
if(grepl("nccs", fname[j])==T){
a <- open_csv_nccs(file_yrs[j], "pc")
yrpc[[paste0(file_yrs[i], "pc")]] <- a
} else {
b<- open_csv_core(file_yrs[j], "pc")
yrpc[[paste0(file_yrs[i], "pc")]] <- b
}
} else if (grepl("pf.csv", fname[j])==T){
if(grepl("nccs", fname[j])==T){
c <- open_csv_nccs(file_yrs[j], "pf")
yrpf[[paste0(file_yrs[i], "pf")]] <- c
} else {
d<- open_csv_core(file_yrs[j], "pf")
yrpf[[paste0(file_yrs[i], "pf")]] <- d
}
} else {
if(grepl("nccs", fname[j])==T){
e<- open_csv_nccs(file_yrs[j], "co")
yrco[[paste0(file_yrs[i], "co")]] <- e
} else {
f<- open_csv_core(file_yrs[j], "co")
yrco[[paste0(file_yrs[i], "co")]] <- f
}
}
}
}
Actually, both of your csv reading functions do exactly the same,
except that the paths are different.
If you find a way to list your files with abstract paths instead of relative
paths (just the file names), you wouldn't need to reconstruct the paths like
you do. This is possible by full.names = TRUE in list.files().
The second point is, it seems there is never from same year and same type
a "nccs.core" file in addition to a "coreco.core" file. So they are mutually
exclusive. So then, there is no logics necessary to distinguish those cases, which simplifies our code.
The third point is, you just want to separate the data frames by filetype ("pc", "pf", "co") and years.
Instead of creating 3 lists for each type, I would create one res-ults list, which contains for each type an inner list.
I would solve this like this:
years <- c(1989:2019)
path_to_type <- function(path) gsub(".*(pc|pf|co)\\.csv", "\\1", path)
res <- list("pc" = list(),
"pf" = list(),
"co" = list())
lapply(years, function(year) {
files <- list.files(path = year, pattern = "\\.csv", full.names = TRUE)
dfs <- lapply(files, function(path) {
print(path) # just to signal that the path is getting processed
df <- read.csv(path)
file_type <- path_to_type(path)
names(df) <- tolower(names(df))
df <- df[df$ntee1 %in% c("C", "D"), ]
df <- df[!(df$nteecc %in% c("D20", "D40", "D50", "D60", "D61")), ]
res[[file_type]][[year]] <- df
})
})
Now you can call from result's list by file_type and year
e.g.:
res[["co"]][[1995]]
res[["pf"]][[2018]]
And so on.
Actually, the results of the lapply() calls in this case are
not interesting. Just the content of res ... (result list).
It seems that in your for(j in 1:length(fname)){... you are creating one of 4 variable a, b, c or d. And you're reusing these variable names, so they are getting overwritten.
The "correct" way to do this is to use lapply in place of the for loop. Pass the list of files, and the required function (i.e. open_csv_core, etc) to lapply, and the return value that you get back is a list of the results.
I have a for loop that assigns multiple data frames in different values and it works by itself. But when I try to create a function with this for loop, it doesn't work. On top of assigning different names to different data frames, I'm also trying to create a vector that keeps the names of these dataframes, but seems like this function doesn't save "dfnames"
create_df <- function(name){
dfnames <- c()
for(i in name){
assign(paste0("subject", i, sep = "_"), passive_subject(i))
dfnames <- c(dfnames, paste0("subject", i, sep = "_"))
dfnames
}
}
How can I go about this?
It would almost certainly be better to return a list of the data frames, and set the names of that list. In general this is a tidier approach than having lots of similar data.frames as separate objects.
create_df <- function(name){
l = lapply(name, passive_subject)
names(l) = paste0("subject", name, sep = "_")
return(l)
}
In the function, there is no return value. We can add the return value and it works.
create_df <- function(name){
dfnames <- c()
for(i in name){
assign(paste0("subject", i, sep = "_"), passive_subject(i),
envir = .GlobalEnv)
dfnames <- c(dfnames, paste0("subject", i, sep = "_"))
}
return(dfnames)
}
I am trying to save streamflow data from USGS using the data Retrieval package of R. It was working until now, but I am not what I changed that it is not working anymore, this is my code:
siteNumber <- c("094985005","09498501","09489500","09489499","09498502","09511300","09498400","09498500","09489700")
i <- 1
n <- length(siteNumber)
for (i in n) {
Daily_Streamflow <- readNWISdv(siteNumber[i],parameterCd="00060", statCd="00003", "","")
name <- paste("DSF", siteNumber[i], sep = "_")
assign(name, value = Daily_Streamflow)
i <- i + 1
}
Now is saving only as data frame the data for the last station. Does someone know what I am doing wrong?
Read ?for. A for() loop iterates over a sequence. You do not need to explicitly increment the index (that is how e.g. while() loops)
for (i in 1:n) {
Daily_Streamflow <- readNWISdv(siteNumber[i],parameterCd="00060", statCd="00003", "","")
name <- paste("DSF", siteNumber[i], sep = "_")
assign(name, value = Daily_Streamflow)
}
I want to create some plots with random numbers in a loop. I want to save the created numbers in separate dataframes for example df1, df2 or df3 but it apparently always overwrites it.
How can I use the i for the dataframes names?
x1 <- c(1:9)
for (i in 1:3)
{
name = paste("Pic_", i, ".png", sep="")
png(name)
x2 <- rnorm(9,2,2)
plot(x1,x2)
df <- data.frame(x1,x2)
dev.off()
}
Try this
for (i in 1:3){
x1<-1:9
assign(paste("df",i,sep = ""), rnorm(9,2,2))
png(paste("Pic_", i, ".png", sep=""))
plot(x1,get(paste("df",i,sep = "")),ylab=paste("df",i,sep = ""))
dev.off()
}
The assign and get functions are important here. Assign creates a name in the environment, which is needed to create dataframes with different names using "i". The get function allows you to search for the dataframes you create again using "i" to search for the correct one. Both use the paste function to allow "i" to change with each iteration of the loop.
This should work - you end up with a list of three data frames.
By using df.list[[i]] you're addressing the index i.
x1 <- c(1:9)
df.list <- list()
for (i in 1:3) {
name = paste("Pic_", i, ".png", sep="")
png(name)
x2 <- rnorm(9, 2, 2)
plot(x1, x2)
df.list[[i]] <- data.frame(x1, x2)
dev.off()
}
Each item of the list is a data frame, accessed like you would any other list object:
> is.data.frame(df.list)
[1] FALSE
> is.data.frame(df.list[[1]])
[1] TRUE
I have different dataframes and what I want to do is:
apply a function repeated times to each dataframe
save results of each repetition on a new dataframe keeping the name of the original dataframes and adding something else to differentiate it
Here is what I have tried until now
# read all files to list
dataframes <- dir( pattern = ".txt")
list_dataframes <- llply(dataframes, read.csv, header = T, sep =" ", dec=".", na.string = "nd")
n <- length(dataframes)
# apply myfunction 10 times
for (j in 1:10){
modified_list <- llply(list_dataframes, myfunction)
}
if (j <10){
num.char <- paste("n0", j, sep="")
} else num.char <- paste("n", j, sep="")
# save back data frames
for (i in 1:n)
write.table(file = paste( "newfile/_modified",num.char, ".csv", sep = ""),
modified_list[i], row.names = F)
What I want as a result is the modified dataframes (in this case the 10 repetitions for each df of the list)that will have:
the name of the original df
the new name
and the number of iteration
Something likeoriginaldfname_newname_n0
I can not find where I'm missing up. Any help will be deeply appreciated
Two major issues, I think:
the } (line 9 above) should be after your second for loop;
your last line should probably reference modified_list[[i]] instead of using the single-[ notation.
So your code should work (untested, slightly modified for style) as:
library(plyr)
# read all files to list
dataframes <- dir(pattern = ".txt")
list_dataframes <- llply(dataframes, read.csv,
header = T, sep = " ", dec=".", na.string = "nd")
n <- length(dataframes)
# apply myfunction 10 times
for (j in 1:10) {
modified_list <- llply(list_dataframes, myfunction)
# save back data frames
for (i in 1:n)
write.table(file = sprintf("newfile/%s_newname_%02d.csv", dataframes[i], j),
modified_list[[i]], row.names = FALSE)
}
If this were code golf, the last portion could be reduced a little with:
for (j in 1:10) {
mapply(function(df, nm) write.csv(file = sprintf('newfile/%s_newname_%02d.csv', nm, j),
df, row.names = FALSE),
llply(list_dataframes, myfunction), dataframes)
}
(This doesn't necessarily make it perfectly clearer, but it does reduce things a bit. Use it if you at some point prefer to not use for loops, though the performance in this case will be almost identical.)
Note:
Please include required libraries, e.g., library(plyr).
Though lapply would have worked just fine, I kept the use of llply to match your example.