I have several data.frames in my Global Environment that I need to merge. Many of the data.frames have identical column names. I want to append a suffix to each column that marks its originating data.frame. Because I have many data.frames, I wanted to automate the process as in the following example.
df1 <- data.frame(id = 1:5,x = LETTERS[1:5])
df2 <- data.frame(id = 1:5,x = LETTERS[6:10])
obj <- ls()
for(o in obj){
s <- sub('df','',eval(o))
names(get(o))[-1] <- paste0(names(get(o))[-1],'.',s)
}
# Error in get(o) <- `*vtmp*` : could not find function "get<-"'
But the individual pieces of the assignment work fine:
names(get(o))[-1]
# [1] "x"
paste0(names(get(o))[-1],'.',s)
# [1] "x.1"
I've used get in a similar way to write.csveach object to a file.
for(o in obj){
write.csv(get(o),file = paste0(o,'.csv'),row.names = F)
}
Any ideas why it's not working in the assignment to change the column names?
The error "could not find function get<-" is R telling you that you can't use <- to update a "got" object. You could probably use assign, but this code is already difficult enough to read. The better solution is to use a list.
From your example:
df1 <- data.frame(id = 1:5,x = LETTERS[1:5])
df2 <- data.frame(id = 1:5,x = LETTERS[6:10])
# put your data frames in a list
df_names = ls(pattern = "df[0-9]+")
df_names # make sure this is the objects you want
# [1] "df1" "df2"
df_list = mget(df_names)
# now we can use a simple for loop (or lapply, mapply, etc.)
for(i in seq_along(df_list)) {
names(df_list[[i]])[-1] =
paste(names(df_list[[i]])[-1],
sub('df', '', names(df_list)[i]),
sep = "."
)
}
# and the column names of the data frames in the list have been updated
df_list
# $df1
# id x.1
# 1 1 A
# 2 2 B
# 3 3 C
# 4 4 D
# 5 5 E
#
# $df2
# id x.2
# 1 1 F
# 2 2 G
# 3 3 H
# 4 4 I
# 5 5 J
It's also now easy to merge them:
Reduce(f = merge, x = df_list)
# id x.1 x.2
# 1 1 A F
# 2 2 B G
# 3 3 C H
# 4 4 D I
# 5 5 E J
For more discussion and examples, see How do I make a list of data frames?
Using setnames from library(data.table) you can do
for(o in obj) {
oldnames = names(get(o))[-1]
newnames = paste0(oldnames, ".new")
setnames(get(o), oldnames, newnames)
}
You can use eval which evaluate an R expression in a specified environment.
df1 <- data.frame(id = 1:5,x = LETTERS[1:5])
df2 <- data.frame(id = 1:5,x = LETTERS[6:10])
obj <- ls()
for(o in obj) {
s <- sub('df', '', o)
new_name <- paste0(names(get(o))[-1], '.', s)
eval(parse(text = paste0('names(', o, ')[-1] <- ', substitute(new_name))))
}
modify df1 and df2
id x.1
1 1 A
2 2 B
3 3 C
4 4 D
5 5 E
Related
I want to add columns based on a function in all lists in list.
list1 <- list(A = c(1:10), B = c(rnorm(1:10)), C = c(rnorm(1:10)), D = c(rnorm(1:10)))
list2 <- list(A = c(1:10), B = c(rnorm(1:10)), C = c(rnorm(1:10)), D = c(rnorm(1:10)))
both_lists <- list(list1,list2)
both_lists <- lapply(both_lists, function(x) ... )
For one dataframe (not in a list) I normally use:
df1 <- data.frame(A = c(1:10), B = c(rnorm(1:10)), C = c(rnorm(1:10)), D = c(rnorm(1:10)))
df2 <- data.frame(A = c(1:10), B = c(rnorm(1:10)), C = c(rnorm(1:10)), D = c(rnorm(1:10)))
df1 %>% mutate(max = do.call(pmax, c(select(., c(2:4)))))
But how do I do this for the lists* in the list? So I want to do 2 things to all the lists in my list:
find the maximum of columns 2-4
add that maximum as a separate row
Oh and could anyone also tell me how I actually change the name of the list inside the list? (So changing the name of list1 to the name of a row name in the set? EG setting the name of list to df1[[1]][1] and repeat that with lapply for every list in the list?
With lapply you can do it as follows:
lapply(both_lists, function(x){x[['max']] <- do.call(pmax, x[2:4]); x})
The output looks like this:
[[1]]
[[1]]$A
[1] 1 2 3 4 5 6 7 8 9 10
[[1]]$B
[1] 1.325128799 0.341702207 0.341139152 -0.630065889 0.799934566 0.427531770
[7] -1.492861023 2.643621022 0.008158055 -0.187956774
[[1]]$C
[1] -0.8535937 -0.1753520 1.1008905 -0.0385363 -1.6739434 0.2179597 -0.1300490 0.4177869
[9] 1.3066992 0.2369493
[[1]]$D
[1] 0.98472409 0.66930725 0.52449977 0.08553770 -1.81759549 -0.07564249 -0.63611958
[8] -1.19293507 -1.61571223 1.29777033
[[1]]$max
[1] 1.3251288 0.6693073 1.1008905 0.0855377 0.7999346 0.4275318 -0.1300490 2.6436210
[9] 1.3066992 1.2977703
[[2]]
...
Assuming your data.frames df1 and df2 as shown in the OP are in a list named dfl:
library(dplyr)
library(magrittr)
dfl <- lapply(dfl, function(x){
x %<>% mutate(max = do.call(pmax, c(select(., c(2:4)))))
})
And if you want to set the names of the list elements as some value from the data.frames within, maybe something like this?
names(dfl) <- lapply(dfl, function(x){
x[2,2]
})
I hope this is what you actually meant because your question was a bit unclear to me. (Apologies if I am wrong.)
I have read
What is the most efficient way to cast a list as a data frame?
Convert a list to a data frame
I have a list with unequal columns names which I try to convert to a data frame, with NA for the missing entries in the shorter rows. It is easy with tidyverse (for example with bind_rows), but this is for a low level package that should use base R only.
mylist = list(
list(a = 3, b = "anton"),
list(a = 5, b = "bertha"),
list(a = 7, b = "caesar", d = TRUE)
)
# No problem with equal number of columns
do.call(rbind, lapply(mylist[1:2], data.frame))
# The list of my names
unique(unlist(lapply(mylist, names)))
# rbind does not like unequal numbers
do.call(rbind, lapply(mylist, data.frame))
Find out the unique columns in the list, in lapply add the additional columns using setdiff.
cols <- unique(unlist(sapply(mylist, names)))
do.call(rbind, lapply(mylist, function(x) {
x <- data.frame(x)
x[setdiff(cols, names(x))] <- NA
x
}))
# a b d
#1 3 anton NA
#2 5 bertha NA
#3 7 caesar TRUE
use indexes instead of columns and transpose it afterwards
l1 = [1,1]
l2 = [2,2,2,2]
df = pd.DataFrame([l1,l2], index = ('l1', 'l2'))
df.T
# l1 l2
# 0 1 2
# 1 1 2
# 2 NaN 2
# 3 NaN 2
I would like to add a new column D to data.frames in a list that contains the first part of column B. But I'm not sure how to adress within lists down to the column level?
create some data
df1 <- data.frame(A = "hey", B = "wass.7", C = "up")
df2 <- data.frame(A = "how", B = "are.1", C = "you")
dfList <- list(df1,df2)
desired output:
# a new column removing the last part of column B
[[1]]
A B C D
1 hey wass.7 up wass
[[2]]
A B C D
1 how are.1 you are
for each data frame I did this, which worked
df1$D<-sub('\\..*', '', df1$B)
in a function I tried this, which is probably
not correctly addressing the columns and returns
"unexpected symbol..."
dfList <- lapply(rapply(dfList, function(x)
x$D<-sub('\\..*', '', x$B) how = "list"),
as.data.frame)
the lapply(rapply) part is copied from Using gsub in list of dataframes with R
Check this out
lapply(dfList, function(x){
x$D <-sub('\\..*', '', x$B);
x
})
[[1]]
A B C D
1 hey wass.7 up wass
[[2]]
A B C D
1 how are.1 you are
The rapply solution does work. However, you needed a comma before the how argument to resolve the error. Additionally, you will NOT be able to assign one new column only replace existing ones. Since rapply is a recursive call, it will run the gsub across every element in nested list so across ALL columns of ALL dataframes.
Otherwise use a simple lapply per #JilberUrbina's answer.
df1 <- data.frame(A = "hey", B = "wass.7", C = "up", stringsAsFactors = F)
df2 <- data.frame(A = "how", B = "are.1", C = "you", stringsAsFactors = F)
dfList <- list(df1,df2)
dfList <- lapply(rapply(dfList, function(x)
sub('\\..*', '', x), how = "list"),
as.data.frame)
dfList
# [[1]]
# A B C
# 1 hey wass up
# [[2]]
# A B C
# 1 how are you
I have a relatively large amount of data stored in a list of data frames with several columns.
For each element of the list I wish to check one column against a reference and if present extract the value held in another column of the same element and place in a new summary matrix.
e.g. with the following example code:
add1 = c("N1","N1","N1")
coords1 = c(1,2,3)
vals1 = c("a","b","c")
extra1 = c("x","y","x")
add2 = c("N2","N2","N2","N2")
coords2 = c(2,3,4,5)
vals2 = c("b","c","d","e")
extra2 = c("z","y","x","x")
add3 = c("N3","N3","N3")
coords3 = c(1,3,5)
vals3 = c("a","c","e")
extra3 = c("z","z","x")
df1 <- data.frame(add1, coords1, vals1, extra1)
df2 <- data.frame(add2, coords2, vals2, extra2)
df3 <- data.frame(add3, coords3, vals3, extra3)
list_all <- list(df1, df2, df3)
coordinate.extract <- unique(unlist(lapply(list_all, "[", 1)))
my_matrix <- matrix(0, ncol = length(list_all)
, nrow = (length(coordinate.extract)))
my_matrix_new <- cbind(as.character(coordinate.extract)
, my_matrix)
I would like to end up with:
my_matrix_new = V1 V2 V3 V4
1 a a
2 b b
3 c c c
4 d
5 e e
i.e. the 3rd column of each list element is chosen based on the value of the second column.
I hope this is clear.
Thanks,
Matt
I would use data.frame as there are mixed classes. You may try merge with Reduce to get the expected output. Select the 2nd and 3rd columns,in each list element, change the column name for the 2nd to be same across all the list elements, merge, and if needed replace the NA elements with ''
lst1 <- lapply(list_all, function(x) {names(x)[2] <- 'V1';x[2:3] })
res <- Reduce(function(...) merge(..., by='V1', all=TRUE), lst1)
res[-1] <- lapply(res[-1], as.character)
res[is.na(res)] <- ''
res
# V1 vals1 vals2 vals3
#1 1 a a
#2 2 b b
#3 3 c c c
#4 4 d
#5 5 e e
We can change the column names
names(res) <- paste0('V', seq_along(res))
Z = data.frame(var1 = c(1,2,3,4,5), var2 = LETTERS[1:5])
testfun <- function(x){
print(x) # prints the data
# but how to get names of the list coming in?
return(NULL)
}
res = lapply(Z, testfun)
I want to access variables "var1" and "var2" inside testfun. How do I retrieve those variables inside testfun? Does lapply even pass that information? colnames(x) does not work.
No, lapply doesn't pass this information to the function. You could lapply along the names and use subsetting to get the list content inside the function.
testfun <- function(nam, mylist){
print(nam) # prints the names
mylist[[nam]] #get list content using subsetting
}
res <- lapply(names(Z), testfun, mylist=Z)
# [1] "var1"
# [1] "var2"
res
# [[1]]
# [1] 1 2 3 4 5
#
# [[2]]
# [1] A B C D E
# Levels: A B C D E
Similar to #Roland's answer, I would just do the apply on 1:length(Z), and pass the list and the list names to the function.
nams <- names(Z)
testfun <- function(i,Z,nams){
print(Z[[i]])
print(nams[i])}
res <- lapply(1:length(Z),testfun,Z=Z,nams=nams)
If you just want to preserve the labels, you can use llply from plyr package.
testfun <- function(x){x}
res <- llply(Z,testfun)
res
result will be:
> res
$var1
[1] 1 2 3 4 5
$var2
[1] A B C D E
Levels: A B C D E