Loop rename variables in R with assign() - r

I am trying to rename a variable over several data frames, but assign wont work. Here is the code I am trying
assign(colnames(eval(as.name(DataFrameX)))[[3]], "<- NewName")
# The idea is, go through every dataset, and change the name of column 3 to
# "NewName" in all of them
This won't return any error (All other versions I could think of returned some kind of error), but it doesn't change the variable name either.
I am using a loop to create several data frames and different variables within each, now I need to rename some of those variables so that the data frames can be merged in one at a later stage. All that works, except for the renaming. If I input myself the names of the dataframe and variables in a regular call with colnames(DF)[[3]] <- "NewName", but somehow when I try to use assign so that it is done in a loop, it doesn't do anything.

Here is what you can do with a loop over all data frames in your environment. Since you are looking for just data frame in your environment, you are immune of the risk to touch any other variable. The point is that you should assign new changes to each data frame within the loop.
df1 <- data.frame(q=1,w=2,e=3)
df2 <- data.frame(q=1,w=2,e=3)
df3 <- data.frame(q=1,w=2,e=3)
# > df1
# q w e
# 1 1 2 3
# > df2
# q w e
# 1 1 2 3
# > df3
# q w e
# 1 1 2 3
DFs=names(which(sapply(.GlobalEnv, is.data.frame)))
for (i in 1:length(DFs)){
df=get(paste0(DFs[i]))
colnames(df)[3]="newName"
assign(DFs[i], df)
}
# > df1
# q w newName
# 1 1 2 3
# > df2
# q w newName
# 1 1 2 3
# > df3
# q w newName
# 1 1 2 3

We could try ?eapply() to apply setnames() from the data.table package to all data.frame's in your global enviromnent.
library(data.table)
eapply(.GlobalEnv, function(x) if (is.data.frame(x)) setnames(x, 3, "NewName"))

Related

Compute average of z-scores of several columns in a data.table

In R, I have a data table and a character vector with a subset of the data table's column names. I need to compute the z-scores (i.e. number of standard deviations from the mean) of each column with a specified name, and put the averages of the z-scores in a new column. I found a solution with explicit for-loops (posted below), but this must be a common enough task that some library function could be made to do the work more elegantly. Is there a better way?
Here's my solution:
#! /usr/bin/env RSCRIPT
library(data.table)
# Sample data table.
dt <- data.table(a=1:3, b=c(5, 6, 3), c=2:4)
# List of column names.
cols <- c('a', 'b')
# Convert columns to z-scores, and add each to a new list of vectors.
zscores <- list()
for (colIx in 1:length(cols)) {
zscores[[colIx]] <- scale(dt[,get(cols[colIx])], center=TRUE, scale=TRUE)
}
# Average corresponding entries of each vector of z-scores.
avg <- numeric(nrow(dt))
for (rowIx in 1:nrow(dt)) {
avg[rowIx] <- mean(sapply(1:length(cols),
function(colIx) {zscores[[colIx]][rowIx]}))
}
# Add new vector to the table, and print out the new table.
dt[,d:=avg]
print(dt)
This gives what you might expect.
a b c d
1: 1 5 2 -0.39089105
2: 2 6 3 0.43643578
3: 3 3 4 -0.04554473
scale can be applied to matrix(-like) object, you can get desired output by
> set(dt, NULL, 'd', rowMeans(scale(dt[, cols, with = F])))
> dt
a b c d
1: 1 5 2 -0.39089105
2: 2 6 3 0.43643578
3: 3 3 4 -0.04554473

Changing dataframes in Loop r

I have many dataframes that I would like to run though a code. Is there a way to change the dataframe name in a loop?
df01$x = rnorm(100)
df02$x = rnorm(100)+2
df03$x = rnorm(100)*2
dflist <- c("df01",
"df02",
"df03")
for (i in 1:length(dflist){
{
#complete tasks by changing df name in existing code
ifelse([[i]]$x > 0,1,[[i]]$x)
}
#I want to do this for a number of different fuctions, so it is best to change the df name before "$"
df[[i]]$Varible = aggregate(df$Varible, .. ,..)}
Consider interacting with collection of data frames in a list and not passing string literals of its names. Within an lapply() function, you can handle your operations, and then even convert each back into individual dataframe objects:
df01 <- data.frame(x=rnorm(100))
df02 <- data.frame(x=rnorm(100)+2)
df03 <- data.frame(x=rnorm(100)*2)
# LIST OF DATAFRAMES (NAMED WITH LITERALS)
dfList <- list(df01=df01, df02=df02, df03=df03)
head(dfList[[1]])
# x
# 1 1.3440091
# 2 0.5838570
# 3 -0.2377477
# 4 -1.5059733
# 5 -1.0605490
# 6 -0.8122972
# ITERATIVELY RUN COLUMN OPERATION
dfList <- lapply(dfList, function(df) {
df$x <- ifelse(df$x > 0, 1, df$x)
return(df)
})
# CONVERT LIST ITEMS INTO INDIVIDUAL ENVIRON OBJECTS
list2env(dfList, envir=.GlobalEnv)
head(df01)
# x
# 1 1.0000000
# 2 1.0000000
# 3 -0.2377477
# 4 -1.5059733
# 5 -1.0605490
# 6 -0.8122972

cor.test into data.frame in R

consider the following example:
require(MuMIn)
data(Cement)
d <- data.frame(Cement)
idx <- seq(11,13)
cor1 <- list()
for (i in 1:length(idx)){
d2 <- d[1:idx[i],]
cor1[[i]] <- cor.test(d2$X1,d2$X2, method = "pearson")
}
out <- lapply(cor1, function(x) c(x$estimate, x$conf.int, x$p.value))
Here I calculate the correlation for a dataset within an iteration loop.
I know want to generate one data.frame made up of the values in the list 'out'. I try using
df <- do.call(rbind.data.frame, out)
but the result does not seem right:
> df
c.0.129614123011664..0.195326511912326..0.228579470307565.
1 0.1296141
2 0.1953265
3 0.2285795
c..0.509907346173941...0.426370467476045...0.368861726657293.
1 -0.5099073
2 -0.4263705
3 -0.3688617
c.0.676861607564929..0.691690831088494..0.692365536706126.
1 0.6768616
2 0.6916908
3 0.6923655
c.0.704071702633775..0.542941653020805..0.452566184329491.
1 0.7040717
2 0.5429417
3 0.4525662
This is not what I am after.
How can I generate a data.frame that has the first column expressing which list the cor.test was calcuated i.e. 1 to 3 in this case, the second column referring to the $estimate and then $conf.int and %p.value resulting in a five column data.frame.
Is this what you're trying to do? Your question is a bit hard to understand. Is a column of indices from the list really necessary? The whole first column will be exactly the same as the row names (which appear on the left-hand side).
> D <- data.frame(cbind(index = seq(length(out)), do.call(rbind, out)))
> names(D)[2:ncol(D)] <- c('estimate', paste0('conf.int', 1:2), 'p.value')
> D
index estimate conf.int1 conf.int2 p.value
1 1 0.1296141 -0.5099073 0.6768616 0.7040717
2 2 0.1953265 -0.4263705 0.6916908 0.5429417
3 3 0.2285795 -0.3688617 0.6923655 0.4525662
It's not entirely clear what you're asking ... you have there such a data frame, just without reasonable column names. You can simplify your code to ..
ctests <- lapply(idx, function(x) cor.test(d[1:x,"X1"], d[1:x, "X2"]))
ctests <- lapply(ctests, "[", c("estimate", "conf.int", "p.value"))
as.data.frame(do.call(rbind, lapply(ctests, unlist)))
# estimate.cor conf.int1 conf.int2 p.value
# 1 0.1296141 -0.5099073 0.6768616 0.7040717
# 2 0.1953265 -0.4263705 0.6916908 0.5429417
# 3 0.2285795 -0.3688617 0.6923655 0.4525662
Is this what you need?

r create a column that contains the objects names inside a lapply function

I would like to create a column that contains the objects names inside a lapply function, as a proxy I call it name.of.x.as.strig.function(), unfortunately I am not sure how to do it, maybe a combination of assign, do.call and paste. But so far using this function only led my into deeper troubles, I am quite sure there is a more R like solution.
# generates a list of dataframes,
data <- list(data.frame(c(1,2),c(3,3)),data.frame(c(1,2),c(3,3)),data.frame(c(1,2),c(3,3)),data.frame(c(1,2),c(3,3)))
# assigns names to dataframe
names(data) <- list("one","two", "tree", "four")
# subsets the second column into the object data.anova
data.anova <- lapply(data, function(x){x <- x[[2]];
return(matrix(x))})
This should allow me to create a column inside the dataframe that contains its name, for all matrices inside the list
data.anova <- lapply(data, function(x){
x$id <- name.of.x.as.strig.function(x)
return(x)})
I would like to retrieve:
3 one
3 one
3 two
3 two
...
Any input is highly appreciated.
Search history: function to retrieve object name as string, R get name of an object inside lapply...
Can it be that you are just looking for stack?
stack(lapply(data, `[[`, 2))
# values ind
# 1 3 one
# 2 3 one
# 3 3 two
# 4 3 two
# 5 3 tree
# 6 3 tree
# 7 3 four
# 8 3 four
(Or, using your original approach: stack(lapply(data, function(x) {x <- x[[2]]; x})))
If this is the case, melt from "reshape2" would also work.
Loop through the indices of data.anova, and use that to fetch both the data and the names:
data.anova <- lapply(seq_along(data.anova), function(i){
x <- as.data.frame(data.anova[[i]])
x$id <- names(data.anova)[i]
return(x)})
This produces:
# [[1]]
# V1 id
# 1 3 one
# 2 3 one
# [[2]]
# V1 id
# 1 3 two
# 2 3 two
# [[3]]
# V1 id
# 1 3 tree
# 2 3 tree
# [[4]]
# V1 id
# 1 3 four
# 2 3 four

Conditional lapply

So I have a bunch of data frames in a list object. Frames are organised such as
ID Category Value
2323 Friend 23.40
3434 Foe -4.00
And I got them into a list by following this topic. I can also run simple functions on them as shown in this topic.
Now I am trying to run a conditional function with lapply, and I'm running into trouble. In some tables the 'ID' column has a different name (say, 'recnum'), and I need to tell lapply to go through each data frame, check if there is a column named 'recnum', and change its name to 'ID', as in
colnr <- which(names(x) == "recnum"
if (length(colnr > 0)) {names(x)[colnr] <- "ID"}
But I'm running into trouble with local scope and who knows what. Any ideas?
Use the rename function from plyr; it renames by name, not position:
x <- data.frame(ID = 1:2,z=1:2)
y <- data.frame('recnum' = 1:2,z=3:4)
.list <- list(x,y)
library(plyr)
lapply(.list, rename, replace = c('recnum' = 'ID'))
[[1]]
ID z
1 1 1
2 2 2
[[2]]
ID z
1 1 3
2 2 4
Your original code works fine:
foo <- function(x){
colnr <- which(names(x) == "recnum")
if (length(colnr > 0)) {names(x)[colnr] <- "ID"}
x
}
.list <- list(x,y)
lapply(.list, foo)
Not sure what your problem was.
If you look at the second part of mnel's answer, you can see that the function foo evaluates x as its last expression. Without that, if you try to change the names of the data.frames in your list directly from within the anonymous function passed to lapply, it will likely not work.
Just as an alternative, you could use gsub and avoid loading an additional package (although plyr is a nice package):
xx <- list(data.frame("recnum" = 1:3, "recnum2" = 1:3),
data.frame("ID" = 4:6, "hat" = 4:6))
lapply(xx, function(x){
names(x) <- gsub("^recnum$", "ID", names(x))
return(x)
})
# [[1]]
# ID recnum2
# 1 1 1
# 2 2 2
# 3 3 3
# [[2]]
# ID hat
# 1 4 4
# 2 5 5
# 3 6 6

Resources