I have a for loop that assigns multiple data frames in different values and it works by itself. But when I try to create a function with this for loop, it doesn't work. On top of assigning different names to different data frames, I'm also trying to create a vector that keeps the names of these dataframes, but seems like this function doesn't save "dfnames"
create_df <- function(name){
dfnames <- c()
for(i in name){
assign(paste0("subject", i, sep = "_"), passive_subject(i))
dfnames <- c(dfnames, paste0("subject", i, sep = "_"))
dfnames
}
}
How can I go about this?
It would almost certainly be better to return a list of the data frames, and set the names of that list. In general this is a tidier approach than having lots of similar data.frames as separate objects.
create_df <- function(name){
l = lapply(name, passive_subject)
names(l) = paste0("subject", name, sep = "_")
return(l)
}
In the function, there is no return value. We can add the return value and it works.
create_df <- function(name){
dfnames <- c()
for(i in name){
assign(paste0("subject", i, sep = "_"), passive_subject(i),
envir = .GlobalEnv)
dfnames <- c(dfnames, paste0("subject", i, sep = "_"))
}
return(dfnames)
}
Related
I have a bunch of csv files that I'm trying to read into R all at once, with each data frame from a csv becoming an element of a list. The loops largely work, but they keep overriding the list elements. So, for example, if I loop over the first 2 files, both data frames in list[[1]] and list[[2]] will contain the data frame for the second file.
#function to open one group of files named with "cores"
open_csv_core<- function(year, orgtype){
file<- paste(year, "/coreco.core", year, orgtype, ".csv", sep = "")
df <- read.csv(file)
names(df) <- tolower(names(df))
df <- df[df$ntee1 %in% c("C","D"),]
df<- df[!(df$nteecc %in% c("D20","D40", "D50", "D60", "D61")),]
return(df)
}
#function to open one group of files named with "nccs"
open_csv_nccs<- function(year, orgtype){
file2<- paste(year, "/nccs.core", year, orgtype, ".csv", sep="")
df2 <- read.csv(file2)
names(df2) <- tolower(names(df2))
df2 <- df2[df2$ntee1 %in% c("C","D"),]
df2<- df2[!(df2$nteecc %in% c("D20","D40", "D50", "D60", "D61")),]
return(df2)
}
#############################################################################
yrpc<- list()
yrpf<- list()
yrco<- list()
fname<- vector()
file_yrs<- as.character(c(1989:2019))
for(i in 1:length(file_yrs)){
fname<- list.files(path = file_yrs[i], pattern = NULL)
#accessing files in a folder and assigning to the proper function to open them based on how the file is named
for(j in 1:length(fname)){
if(grepl("pc.csv", fname[j])==T) {
if(grepl("nccs", fname[j])==T){
a <- open_csv_nccs(file_yrs[j], "pc")
yrpc[[paste0(file_yrs[i], "pc")]] <- a
} else {
b<- open_csv_core(file_yrs[j], "pc")
yrpc[[paste0(file_yrs[i], "pc")]] <- b
}
} else if (grepl("pf.csv", fname[j])==T){
if(grepl("nccs", fname[j])==T){
c <- open_csv_nccs(file_yrs[j], "pf")
yrpf[[paste0(file_yrs[i], "pf")]] <- c
} else {
d<- open_csv_core(file_yrs[j], "pf")
yrpf[[paste0(file_yrs[i], "pf")]] <- d
}
} else {
if(grepl("nccs", fname[j])==T){
e<- open_csv_nccs(file_yrs[j], "co")
yrco[[paste0(file_yrs[i], "co")]] <- e
} else {
f<- open_csv_core(file_yrs[j], "co")
yrco[[paste0(file_yrs[i], "co")]] <- f
}
}
}
}
Actually, both of your csv reading functions do exactly the same,
except that the paths are different.
If you find a way to list your files with abstract paths instead of relative
paths (just the file names), you wouldn't need to reconstruct the paths like
you do. This is possible by full.names = TRUE in list.files().
The second point is, it seems there is never from same year and same type
a "nccs.core" file in addition to a "coreco.core" file. So they are mutually
exclusive. So then, there is no logics necessary to distinguish those cases, which simplifies our code.
The third point is, you just want to separate the data frames by filetype ("pc", "pf", "co") and years.
Instead of creating 3 lists for each type, I would create one res-ults list, which contains for each type an inner list.
I would solve this like this:
years <- c(1989:2019)
path_to_type <- function(path) gsub(".*(pc|pf|co)\\.csv", "\\1", path)
res <- list("pc" = list(),
"pf" = list(),
"co" = list())
lapply(years, function(year) {
files <- list.files(path = year, pattern = "\\.csv", full.names = TRUE)
dfs <- lapply(files, function(path) {
print(path) # just to signal that the path is getting processed
df <- read.csv(path)
file_type <- path_to_type(path)
names(df) <- tolower(names(df))
df <- df[df$ntee1 %in% c("C", "D"), ]
df <- df[!(df$nteecc %in% c("D20", "D40", "D50", "D60", "D61")), ]
res[[file_type]][[year]] <- df
})
})
Now you can call from result's list by file_type and year
e.g.:
res[["co"]][[1995]]
res[["pf"]][[2018]]
And so on.
Actually, the results of the lapply() calls in this case are
not interesting. Just the content of res ... (result list).
It seems that in your for(j in 1:length(fname)){... you are creating one of 4 variable a, b, c or d. And you're reusing these variable names, so they are getting overwritten.
The "correct" way to do this is to use lapply in place of the for loop. Pass the list of files, and the required function (i.e. open_csv_core, etc) to lapply, and the return value that you get back is a list of the results.
I would like to make the same changes to the column names of many dataframes. Here's an example:
ChangeNames <- function(x) {
colnames(x) <- toupper(colnames(x))
colnames(x) <- str_replace_all(colnames(x), pattern = "_", replacement = ".")
return(x)
}
files <- list(mtcars, nycflights13::flights, nycflights13::airports)
lapply(files, ChangeNames)
I know that lapply only changes a copy. How do I change the underlying dataframe? I want to still use each dataframe separately.
Create a named list, apply the function and use list2env to reflect those changes in the original dataframes.
library(nycflights13)
files <- dplyr::lst(mtcars, flights, airports)
result <- lapply(files, ChangeNames)
list2env(result, .GlobalEnv)
I'm trying to save a data frame after every iteration of this loop, while appending the data frame with the loop number. So, I'll be left with 5 data frames all with different names.
In my actual code, all the data frames will be different but for simplicity I've just shown one data frame here.
I've supplied some test code below.
testFunction <- function() {
for (i in 1:5) {
x <- data.frame(c(1:10), c(1,2,3,4,5,6,7,8,9,10), c(10:19))
name <- paste("name", i, sep = "_")
name <- x
}
}
The example data frames created would be named:
testFunction()
name_1
name_2
name_3
name_4
name_5
However, I'm only getting the final data frame "name_5" to save when the loop completes. My issue is I don't know how to save the ith version of the data frame without escaping from the loop.
Any suggestions on how I can solve this?
***** EDIT *****
I have my for loop inside a function, which might be why assign() is not working. I've appended my example above to show this.
Inside your loop, use assign():
for (i in 1:5) {
x <- data.frame(c(1:10), c(1,2,3,4,5,6,7,8,9,10), c(10:19))
assign( paste("name", i, sep = "_") , x)
}
Edit:
As you now want to do this in a function, you would have to specify the environment to assign to. I suspect you want the global environment:
testFunction <- function() {
for (i in 1:5) {
x <- data.frame(c(1:10), c(1,2,3,4,5,6,7,8,9,10), c(10:19))
assign( paste("name", i, sep = "_") , x , envir = globalenv() )
}
}
Please be warned that it is not good practice to write a function that edits the enclosing environment. You'd be better off just returning a named list of your data frames, e.g. like so:
testFunction_2 <- function() {
out_list <- vector(mode = "list", length = 5)
for (i in 1:5) {
x <- data.frame(c(1:10), c(1,2,3,4,5,6,7,8,9,10), c(10:19))
out_list[[i]] <- x
names(out_list)[i] <- paste("name", i, sep = "_")
}
return(out_list)
}
I have different dataframes and what I want to do is:
apply a function repeated times to each dataframe
save results of each repetition on a new dataframe keeping the name of the original dataframes and adding something else to differentiate it
Here is what I have tried until now
# read all files to list
dataframes <- dir( pattern = ".txt")
list_dataframes <- llply(dataframes, read.csv, header = T, sep =" ", dec=".", na.string = "nd")
n <- length(dataframes)
# apply myfunction 10 times
for (j in 1:10){
modified_list <- llply(list_dataframes, myfunction)
}
if (j <10){
num.char <- paste("n0", j, sep="")
} else num.char <- paste("n", j, sep="")
# save back data frames
for (i in 1:n)
write.table(file = paste( "newfile/_modified",num.char, ".csv", sep = ""),
modified_list[i], row.names = F)
What I want as a result is the modified dataframes (in this case the 10 repetitions for each df of the list)that will have:
the name of the original df
the new name
and the number of iteration
Something likeoriginaldfname_newname_n0
I can not find where I'm missing up. Any help will be deeply appreciated
Two major issues, I think:
the } (line 9 above) should be after your second for loop;
your last line should probably reference modified_list[[i]] instead of using the single-[ notation.
So your code should work (untested, slightly modified for style) as:
library(plyr)
# read all files to list
dataframes <- dir(pattern = ".txt")
list_dataframes <- llply(dataframes, read.csv,
header = T, sep = " ", dec=".", na.string = "nd")
n <- length(dataframes)
# apply myfunction 10 times
for (j in 1:10) {
modified_list <- llply(list_dataframes, myfunction)
# save back data frames
for (i in 1:n)
write.table(file = sprintf("newfile/%s_newname_%02d.csv", dataframes[i], j),
modified_list[[i]], row.names = FALSE)
}
If this were code golf, the last portion could be reduced a little with:
for (j in 1:10) {
mapply(function(df, nm) write.csv(file = sprintf('newfile/%s_newname_%02d.csv', nm, j),
df, row.names = FALSE),
llply(list_dataframes, myfunction), dataframes)
}
(This doesn't necessarily make it perfectly clearer, but it does reduce things a bit. Use it if you at some point prefer to not use for loops, though the performance in this case will be almost identical.)
Note:
Please include required libraries, e.g., library(plyr).
Though lapply would have worked just fine, I kept the use of llply to match your example.
Just for example, I have a dataframe with columns: name, n, mean and sd. How do I extract and then save the elements of a list into a single rda file. The file should contain the generated datasets and not the list.
random.r <- function(df, filename) {
save.random <- function(name, n, mean, sd) {
rn <- rnorm(n=n, mean=mean, sd=sd)
assign(deparse(name), rn)
}
rlist <- sapply(1:nrow(df), function(x)
save.random(df$name[x], df$n[x],df$mean[x],df$sd[x],simplify = FALSE))
save(list = rlist, file = paste(filename,".Rda",sep=""), envir = .GlobalEnv)
}
Cheers
The trick is to tell R where to find the objects referred to in save. To do this, provide the list itself as an environment:
save(list=names(rlist), file=..., envir=as.environment(rlist))
Note also that list must be a vector of object names, so this should be names(rlist), not simply rlist, since the latter is a list of numeric vectors.
The following is a modification of your random.r, which works as you had intended. At the end of this post I also provide simplified code that achieves the same.
random.r <- function(df, filename) {
save.random <- function(name, n, mean, sd) {
rnorm(n=n, mean=mean, sd=sd)
}
rlist <- setNames(lapply(1:nrow(df), function(x) {
save.random(df$name[x], df$n[x], df$mean[x], df$sd[x])
}), df$name)
save(list = names(rlist), file = paste0(filename, ".rda"),
envir = as.environment(rlist))
}
The key changes above are the specification of names(rlist) as the list (vector) of element names that you want to save, and as.environment(rlist) as the environment in which you want R to search for objects with those names. Note also that I've used setNames to correctly assign elements of df$name as the names of the resulting elements of rlist.
A simplified version would be:
rlist <- setNames(mapply(rnorm, d$n, d$mean, d$sd), d$name)
save(list=names(rlist), file='~/../Desktop/foo.rda',
envir=as.environment(rlist))
where d is your data.frame. Here, mapply is a handy shortcut; it steps through the vectors d$n, d$mean and d$sd simultaneously, performing rnorm each time.
The simplified code can of course be wrapped into a function if you require, e.g.:
f <- function(x, filename) {
rlist <- setNames(mapply(rnorm, x$n, x$mean, x$sd), x$name)
save(list=names(rlist), file=paste0(filename, '.rda'),
envir=as.environment(rlist))
}
d <- data.frame(name=LETTERS, n=sample(100, 26), mean=runif(26), sd=runif(26),
stringsAsFactors=FALSE)
f(d, '~/../Desktop/foo')