R loop over dataframe to create new dataframes - r

Suppose i have several data frames
dfx01=data.frame(city=c("a","b","c","d"),yr=c(2000,2001,2003,2002))
dfx02=data.frame(city=c("a","e","c","d"),yr=c(2000,2001,2005,2002))
dfx012=data.frame(city=c("f","b","c","d"),yr=c(2000,2000,2001,2002))
dfx022=data.frame(city=c("f","b","c","g"),yr=c(2002,2000,2003,2001))
how should i output corresponding data frames x01,x02,x012,x022 that subsets only yr=2001?
i attempted lapply
dflist=list(dfx01,dfx02,dfx012,dfx022)
lapply(dflist, fun(x){subset(x,startyr=2000)})
But how to name 4 new data frame x01,x02,x012,x022? thanks.

Your call just needs to be changed a little. Try
lapply(dflist, subset, yr == 2000)
But I prefer [ subsetting, because subset can have unintended results. Here's how to do that, and add new names at the same time. To set names similar to your data frame names, it's best to add names to the list first.
> dflist <- setNames(dflist, grep("dfx0", ls(), value = TRUE))
> setNames(lapply(dflist, function(x) x[x$yr==2001, ]),
gsub("df", "", names(dflist)))
# $x01
# city yr
# 2 b 2001
#
# $x012
# city yr
# 2 e 2001
#
# $x02
# city yr
# 3 c 2001
#
# $x022
# city yr
# 4 g 2001

Related

Subsetting a dataframe by partial Data point names

I have a dataframe of detections from acoustic receivers. I have about 70 receivers and am looking to subset my data by "line" of receivers. Station names are indicated as such: "TRC1-69", "TRC1-180", "TRC2-69", "TRC2-180".... "TRD1-69", "TRD1-180", "TRD2-69", "TRD2-180". Basically I'm trying to get all the C receivers in one dataframe, the D receivers in one dataframe and so on.
This is what I've tried so far
Dline <- AC[rownames(AC) %like% "TRD", ]
or
Dline <- subset(AC, Station == "TRD")
Here's a way:
df1 <- data.frame(
val = 1:8,
row.names = c("TRC1-69", "TRC1-180", "TRC2-69", "TRC2-180",
"TRD1-69", "TRD1-180", "TRD2-69", "TRD2-180"))
split(df1, substr(row.names(df1),3,3))
# $C
# val
# TRC1-69 1
# TRC1-180 2
# TRC2-69 3
# TRC2-180 4
#
# $D
# val
# TRD1-69 5
# TRD1-180 6
# TRD2-69 7
# TRD2-180 8
You can use simple regex via gsub, i.e. (Using #Moody_Mudskipper data set)
split(df1, gsub('(.*)[0-9]+-[0-9]+', '\\1', rownames(df1)))
#$`TRC`
# val
#TRC1-69 1
#TRC1-180 2
#TRC2-69 3
#TRC2-180 4
#$TRD
# val
#TRD1-69 5
#TRD1-180 6
#TRD2-69 7
#TRD2-180 8
We can use grepl in subset when there is partial match
subset(AC, grepl("^TRD", Station))
and to do this in one step, split into a list of data.frames
lst1 <- split(AC, grepl("^TRD", AC$Station))

How to create a loop inside a loop in R

I am new to R and I need some help in relation loops. I need to produce a huge amount to tables from one data set and I think that a loop inside a loop will solve the problem, but I am having problems getting the right result.
Lets say I have the following data set:
var1 <- c("A","A","A","A","B","B","B","B")
var2 <- c(1,2,1,2,1,2,1,2)
df <- data.frame(var1,var2)
And I want to extract the data in 4 tables:
Result of "A" & 1
Result of "A" & 2
Result of "B" & 1
Result of "B" & 2
I have this loop, but I cannot get the 4 tables. Can anyone help!
for (i in df$var1) {
dummy<- df%>%filter(var1 == i)
for (j in dummy$var2) {
nTab <- paste0("tab_", j, sep ="")
assign(nTab, dummy%>%filter (var2 == j))
}
}
Expanding on #Gregor's comment, and the question here,
Save all data frames in list to separate .csv files,
you can use Map() with the split() function to output the newly created dataframes to individual csv files:
Code:
s=split(df, f = paste(df$var1, df$var2, sep = "_"))
Map(write.csv,s, paste0("table_",names(s),".csv"),row.names=F)
which will write the csv's to your current working directory, with the names "table_A_1.csv", etc based on the value of var1 and var2.
We can split the data frame into several data frames based on two columns and store them in a list.
df_list <- split(df, f = list(df$var1, df$var2))
df_list
# $A.1
# var1 var2
# 1 A 1
# 3 A 1
#
# $B.1
# var1 var2
# 5 B 1
# 7 B 1
#
# $A.2
# var1 var2
# 2 A 2
# 4 A 2
#
# $B.2
# var1 var2
# 6 B 2
# 8 B 2
To save the data frames in the list, we can further use the lapply function.
lapply(names(df_list), function(x) write.csv(df_list[[x]], paste0(x, ".csv"), row.names = FALSE))
Where df_list[[x]] is the way to access individual data frames based on the name. paste0(x, ".csv") is to construct file directory.

Changing dataframes in Loop r

I have many dataframes that I would like to run though a code. Is there a way to change the dataframe name in a loop?
df01$x = rnorm(100)
df02$x = rnorm(100)+2
df03$x = rnorm(100)*2
dflist <- c("df01",
"df02",
"df03")
for (i in 1:length(dflist){
{
#complete tasks by changing df name in existing code
ifelse([[i]]$x > 0,1,[[i]]$x)
}
#I want to do this for a number of different fuctions, so it is best to change the df name before "$"
df[[i]]$Varible = aggregate(df$Varible, .. ,..)}
Consider interacting with collection of data frames in a list and not passing string literals of its names. Within an lapply() function, you can handle your operations, and then even convert each back into individual dataframe objects:
df01 <- data.frame(x=rnorm(100))
df02 <- data.frame(x=rnorm(100)+2)
df03 <- data.frame(x=rnorm(100)*2)
# LIST OF DATAFRAMES (NAMED WITH LITERALS)
dfList <- list(df01=df01, df02=df02, df03=df03)
head(dfList[[1]])
# x
# 1 1.3440091
# 2 0.5838570
# 3 -0.2377477
# 4 -1.5059733
# 5 -1.0605490
# 6 -0.8122972
# ITERATIVELY RUN COLUMN OPERATION
dfList <- lapply(dfList, function(df) {
df$x <- ifelse(df$x > 0, 1, df$x)
return(df)
})
# CONVERT LIST ITEMS INTO INDIVIDUAL ENVIRON OBJECTS
list2env(dfList, envir=.GlobalEnv)
head(df01)
# x
# 1 1.0000000
# 2 1.0000000
# 3 -0.2377477
# 4 -1.5059733
# 5 -1.0605490
# 6 -0.8122972

Loop rename variables in R with assign()

I am trying to rename a variable over several data frames, but assign wont work. Here is the code I am trying
assign(colnames(eval(as.name(DataFrameX)))[[3]], "<- NewName")
# The idea is, go through every dataset, and change the name of column 3 to
# "NewName" in all of them
This won't return any error (All other versions I could think of returned some kind of error), but it doesn't change the variable name either.
I am using a loop to create several data frames and different variables within each, now I need to rename some of those variables so that the data frames can be merged in one at a later stage. All that works, except for the renaming. If I input myself the names of the dataframe and variables in a regular call with colnames(DF)[[3]] <- "NewName", but somehow when I try to use assign so that it is done in a loop, it doesn't do anything.
Here is what you can do with a loop over all data frames in your environment. Since you are looking for just data frame in your environment, you are immune of the risk to touch any other variable. The point is that you should assign new changes to each data frame within the loop.
df1 <- data.frame(q=1,w=2,e=3)
df2 <- data.frame(q=1,w=2,e=3)
df3 <- data.frame(q=1,w=2,e=3)
# > df1
# q w e
# 1 1 2 3
# > df2
# q w e
# 1 1 2 3
# > df3
# q w e
# 1 1 2 3
DFs=names(which(sapply(.GlobalEnv, is.data.frame)))
for (i in 1:length(DFs)){
df=get(paste0(DFs[i]))
colnames(df)[3]="newName"
assign(DFs[i], df)
}
# > df1
# q w newName
# 1 1 2 3
# > df2
# q w newName
# 1 1 2 3
# > df3
# q w newName
# 1 1 2 3
We could try ?eapply() to apply setnames() from the data.table package to all data.frame's in your global enviromnent.
library(data.table)
eapply(.GlobalEnv, function(x) if (is.data.frame(x)) setnames(x, 3, "NewName"))

r create a column that contains the objects names inside a lapply function

I would like to create a column that contains the objects names inside a lapply function, as a proxy I call it name.of.x.as.strig.function(), unfortunately I am not sure how to do it, maybe a combination of assign, do.call and paste. But so far using this function only led my into deeper troubles, I am quite sure there is a more R like solution.
# generates a list of dataframes,
data <- list(data.frame(c(1,2),c(3,3)),data.frame(c(1,2),c(3,3)),data.frame(c(1,2),c(3,3)),data.frame(c(1,2),c(3,3)))
# assigns names to dataframe
names(data) <- list("one","two", "tree", "four")
# subsets the second column into the object data.anova
data.anova <- lapply(data, function(x){x <- x[[2]];
return(matrix(x))})
This should allow me to create a column inside the dataframe that contains its name, for all matrices inside the list
data.anova <- lapply(data, function(x){
x$id <- name.of.x.as.strig.function(x)
return(x)})
I would like to retrieve:
3 one
3 one
3 two
3 two
...
Any input is highly appreciated.
Search history: function to retrieve object name as string, R get name of an object inside lapply...
Can it be that you are just looking for stack?
stack(lapply(data, `[[`, 2))
# values ind
# 1 3 one
# 2 3 one
# 3 3 two
# 4 3 two
# 5 3 tree
# 6 3 tree
# 7 3 four
# 8 3 four
(Or, using your original approach: stack(lapply(data, function(x) {x <- x[[2]]; x})))
If this is the case, melt from "reshape2" would also work.
Loop through the indices of data.anova, and use that to fetch both the data and the names:
data.anova <- lapply(seq_along(data.anova), function(i){
x <- as.data.frame(data.anova[[i]])
x$id <- names(data.anova)[i]
return(x)})
This produces:
# [[1]]
# V1 id
# 1 3 one
# 2 3 one
# [[2]]
# V1 id
# 1 3 two
# 2 3 two
# [[3]]
# V1 id
# 1 3 tree
# 2 3 tree
# [[4]]
# V1 id
# 1 3 four
# 2 3 four

Resources