Changing dataframes in Loop r - r

I have many dataframes that I would like to run though a code. Is there a way to change the dataframe name in a loop?
df01$x = rnorm(100)
df02$x = rnorm(100)+2
df03$x = rnorm(100)*2
dflist <- c("df01",
"df02",
"df03")
for (i in 1:length(dflist){
{
#complete tasks by changing df name in existing code
ifelse([[i]]$x > 0,1,[[i]]$x)
}
#I want to do this for a number of different fuctions, so it is best to change the df name before "$"
df[[i]]$Varible = aggregate(df$Varible, .. ,..)}

Consider interacting with collection of data frames in a list and not passing string literals of its names. Within an lapply() function, you can handle your operations, and then even convert each back into individual dataframe objects:
df01 <- data.frame(x=rnorm(100))
df02 <- data.frame(x=rnorm(100)+2)
df03 <- data.frame(x=rnorm(100)*2)
# LIST OF DATAFRAMES (NAMED WITH LITERALS)
dfList <- list(df01=df01, df02=df02, df03=df03)
head(dfList[[1]])
# x
# 1 1.3440091
# 2 0.5838570
# 3 -0.2377477
# 4 -1.5059733
# 5 -1.0605490
# 6 -0.8122972
# ITERATIVELY RUN COLUMN OPERATION
dfList <- lapply(dfList, function(df) {
df$x <- ifelse(df$x > 0, 1, df$x)
return(df)
})
# CONVERT LIST ITEMS INTO INDIVIDUAL ENVIRON OBJECTS
list2env(dfList, envir=.GlobalEnv)
head(df01)
# x
# 1 1.0000000
# 2 1.0000000
# 3 -0.2377477
# 4 -1.5059733
# 5 -1.0605490
# 6 -0.8122972

Related

How to create a loop inside a loop in R

I am new to R and I need some help in relation loops. I need to produce a huge amount to tables from one data set and I think that a loop inside a loop will solve the problem, but I am having problems getting the right result.
Lets say I have the following data set:
var1 <- c("A","A","A","A","B","B","B","B")
var2 <- c(1,2,1,2,1,2,1,2)
df <- data.frame(var1,var2)
And I want to extract the data in 4 tables:
Result of "A" & 1
Result of "A" & 2
Result of "B" & 1
Result of "B" & 2
I have this loop, but I cannot get the 4 tables. Can anyone help!
for (i in df$var1) {
dummy<- df%>%filter(var1 == i)
for (j in dummy$var2) {
nTab <- paste0("tab_", j, sep ="")
assign(nTab, dummy%>%filter (var2 == j))
}
}
Expanding on #Gregor's comment, and the question here,
Save all data frames in list to separate .csv files,
you can use Map() with the split() function to output the newly created dataframes to individual csv files:
Code:
s=split(df, f = paste(df$var1, df$var2, sep = "_"))
Map(write.csv,s, paste0("table_",names(s),".csv"),row.names=F)
which will write the csv's to your current working directory, with the names "table_A_1.csv", etc based on the value of var1 and var2.
We can split the data frame into several data frames based on two columns and store them in a list.
df_list <- split(df, f = list(df$var1, df$var2))
df_list
# $A.1
# var1 var2
# 1 A 1
# 3 A 1
#
# $B.1
# var1 var2
# 5 B 1
# 7 B 1
#
# $A.2
# var1 var2
# 2 A 2
# 4 A 2
#
# $B.2
# var1 var2
# 6 B 2
# 8 B 2
To save the data frames in the list, we can further use the lapply function.
lapply(names(df_list), function(x) write.csv(df_list[[x]], paste0(x, ".csv"), row.names = FALSE))
Where df_list[[x]] is the way to access individual data frames based on the name. paste0(x, ".csv") is to construct file directory.

Loop rename variables in R with assign()

I am trying to rename a variable over several data frames, but assign wont work. Here is the code I am trying
assign(colnames(eval(as.name(DataFrameX)))[[3]], "<- NewName")
# The idea is, go through every dataset, and change the name of column 3 to
# "NewName" in all of them
This won't return any error (All other versions I could think of returned some kind of error), but it doesn't change the variable name either.
I am using a loop to create several data frames and different variables within each, now I need to rename some of those variables so that the data frames can be merged in one at a later stage. All that works, except for the renaming. If I input myself the names of the dataframe and variables in a regular call with colnames(DF)[[3]] <- "NewName", but somehow when I try to use assign so that it is done in a loop, it doesn't do anything.
Here is what you can do with a loop over all data frames in your environment. Since you are looking for just data frame in your environment, you are immune of the risk to touch any other variable. The point is that you should assign new changes to each data frame within the loop.
df1 <- data.frame(q=1,w=2,e=3)
df2 <- data.frame(q=1,w=2,e=3)
df3 <- data.frame(q=1,w=2,e=3)
# > df1
# q w e
# 1 1 2 3
# > df2
# q w e
# 1 1 2 3
# > df3
# q w e
# 1 1 2 3
DFs=names(which(sapply(.GlobalEnv, is.data.frame)))
for (i in 1:length(DFs)){
df=get(paste0(DFs[i]))
colnames(df)[3]="newName"
assign(DFs[i], df)
}
# > df1
# q w newName
# 1 1 2 3
# > df2
# q w newName
# 1 1 2 3
# > df3
# q w newName
# 1 1 2 3
We could try ?eapply() to apply setnames() from the data.table package to all data.frame's in your global enviromnent.
library(data.table)
eapply(.GlobalEnv, function(x) if (is.data.frame(x)) setnames(x, 3, "NewName"))

cor.test into data.frame in R

consider the following example:
require(MuMIn)
data(Cement)
d <- data.frame(Cement)
idx <- seq(11,13)
cor1 <- list()
for (i in 1:length(idx)){
d2 <- d[1:idx[i],]
cor1[[i]] <- cor.test(d2$X1,d2$X2, method = "pearson")
}
out <- lapply(cor1, function(x) c(x$estimate, x$conf.int, x$p.value))
Here I calculate the correlation for a dataset within an iteration loop.
I know want to generate one data.frame made up of the values in the list 'out'. I try using
df <- do.call(rbind.data.frame, out)
but the result does not seem right:
> df
c.0.129614123011664..0.195326511912326..0.228579470307565.
1 0.1296141
2 0.1953265
3 0.2285795
c..0.509907346173941...0.426370467476045...0.368861726657293.
1 -0.5099073
2 -0.4263705
3 -0.3688617
c.0.676861607564929..0.691690831088494..0.692365536706126.
1 0.6768616
2 0.6916908
3 0.6923655
c.0.704071702633775..0.542941653020805..0.452566184329491.
1 0.7040717
2 0.5429417
3 0.4525662
This is not what I am after.
How can I generate a data.frame that has the first column expressing which list the cor.test was calcuated i.e. 1 to 3 in this case, the second column referring to the $estimate and then $conf.int and %p.value resulting in a five column data.frame.
Is this what you're trying to do? Your question is a bit hard to understand. Is a column of indices from the list really necessary? The whole first column will be exactly the same as the row names (which appear on the left-hand side).
> D <- data.frame(cbind(index = seq(length(out)), do.call(rbind, out)))
> names(D)[2:ncol(D)] <- c('estimate', paste0('conf.int', 1:2), 'p.value')
> D
index estimate conf.int1 conf.int2 p.value
1 1 0.1296141 -0.5099073 0.6768616 0.7040717
2 2 0.1953265 -0.4263705 0.6916908 0.5429417
3 3 0.2285795 -0.3688617 0.6923655 0.4525662
It's not entirely clear what you're asking ... you have there such a data frame, just without reasonable column names. You can simplify your code to ..
ctests <- lapply(idx, function(x) cor.test(d[1:x,"X1"], d[1:x, "X2"]))
ctests <- lapply(ctests, "[", c("estimate", "conf.int", "p.value"))
as.data.frame(do.call(rbind, lapply(ctests, unlist)))
# estimate.cor conf.int1 conf.int2 p.value
# 1 0.1296141 -0.5099073 0.6768616 0.7040717
# 2 0.1953265 -0.4263705 0.6916908 0.5429417
# 3 0.2285795 -0.3688617 0.6923655 0.4525662
Is this what you need?

r create a column that contains the objects names inside a lapply function

I would like to create a column that contains the objects names inside a lapply function, as a proxy I call it name.of.x.as.strig.function(), unfortunately I am not sure how to do it, maybe a combination of assign, do.call and paste. But so far using this function only led my into deeper troubles, I am quite sure there is a more R like solution.
# generates a list of dataframes,
data <- list(data.frame(c(1,2),c(3,3)),data.frame(c(1,2),c(3,3)),data.frame(c(1,2),c(3,3)),data.frame(c(1,2),c(3,3)))
# assigns names to dataframe
names(data) <- list("one","two", "tree", "four")
# subsets the second column into the object data.anova
data.anova <- lapply(data, function(x){x <- x[[2]];
return(matrix(x))})
This should allow me to create a column inside the dataframe that contains its name, for all matrices inside the list
data.anova <- lapply(data, function(x){
x$id <- name.of.x.as.strig.function(x)
return(x)})
I would like to retrieve:
3 one
3 one
3 two
3 two
...
Any input is highly appreciated.
Search history: function to retrieve object name as string, R get name of an object inside lapply...
Can it be that you are just looking for stack?
stack(lapply(data, `[[`, 2))
# values ind
# 1 3 one
# 2 3 one
# 3 3 two
# 4 3 two
# 5 3 tree
# 6 3 tree
# 7 3 four
# 8 3 four
(Or, using your original approach: stack(lapply(data, function(x) {x <- x[[2]]; x})))
If this is the case, melt from "reshape2" would also work.
Loop through the indices of data.anova, and use that to fetch both the data and the names:
data.anova <- lapply(seq_along(data.anova), function(i){
x <- as.data.frame(data.anova[[i]])
x$id <- names(data.anova)[i]
return(x)})
This produces:
# [[1]]
# V1 id
# 1 3 one
# 2 3 one
# [[2]]
# V1 id
# 1 3 two
# 2 3 two
# [[3]]
# V1 id
# 1 3 tree
# 2 3 tree
# [[4]]
# V1 id
# 1 3 four
# 2 3 four

Conditional lapply

So I have a bunch of data frames in a list object. Frames are organised such as
ID Category Value
2323 Friend 23.40
3434 Foe -4.00
And I got them into a list by following this topic. I can also run simple functions on them as shown in this topic.
Now I am trying to run a conditional function with lapply, and I'm running into trouble. In some tables the 'ID' column has a different name (say, 'recnum'), and I need to tell lapply to go through each data frame, check if there is a column named 'recnum', and change its name to 'ID', as in
colnr <- which(names(x) == "recnum"
if (length(colnr > 0)) {names(x)[colnr] <- "ID"}
But I'm running into trouble with local scope and who knows what. Any ideas?
Use the rename function from plyr; it renames by name, not position:
x <- data.frame(ID = 1:2,z=1:2)
y <- data.frame('recnum' = 1:2,z=3:4)
.list <- list(x,y)
library(plyr)
lapply(.list, rename, replace = c('recnum' = 'ID'))
[[1]]
ID z
1 1 1
2 2 2
[[2]]
ID z
1 1 3
2 2 4
Your original code works fine:
foo <- function(x){
colnr <- which(names(x) == "recnum")
if (length(colnr > 0)) {names(x)[colnr] <- "ID"}
x
}
.list <- list(x,y)
lapply(.list, foo)
Not sure what your problem was.
If you look at the second part of mnel's answer, you can see that the function foo evaluates x as its last expression. Without that, if you try to change the names of the data.frames in your list directly from within the anonymous function passed to lapply, it will likely not work.
Just as an alternative, you could use gsub and avoid loading an additional package (although plyr is a nice package):
xx <- list(data.frame("recnum" = 1:3, "recnum2" = 1:3),
data.frame("ID" = 4:6, "hat" = 4:6))
lapply(xx, function(x){
names(x) <- gsub("^recnum$", "ID", names(x))
return(x)
})
# [[1]]
# ID recnum2
# 1 1 1
# 2 2 2
# 3 3 3
# [[2]]
# ID hat
# 1 4 4
# 2 5 5
# 3 6 6

Resources