Assign the results of do.call using cbind to data frames - r

I want to combine multiple sets of two data frames (a & a_1, b & b_1, etc.). Basically, I want to do what this question is asking. I created a list of my two data sets:
# create data
a <- c(1, 2, 3)
b <- c(2, 3, 4)
at0H0 <- data.frame(a, b)
c <- c(1, 2, 3)
d <- c(2, 3, 4)
at0H0_1 <- data.frame(c, d)
e <- c(1, 2, 3)
f <- c(2, 3, 4)
at0H1 <- data.frame(a, b)
g <- c(1, 2, 3)
h <- c(2, 3, 4)
at0H1_1 <- data.frame(c, d)
# create lists of names
names <- list("at0H0", "at0H1")
namesLPC <- list("at0H0_1", "at0H1_1")
# column bind the data frames?
dfList <- list(cbind(names, namesLPC))
do.call(cbind, dfList)
But now I need it to create data frames for each. This do.call function just creates a list of the names of the data frames. Thanks!
(Edited to make reproducible code)

It's not super straight-forward, but with a little editing to a joining function you can get there:
joinfun <- function(x) do.call(cbind, unname(mget(x,inherits=TRUE)))
lapply(Map(c, names, namesLPC), joinfun)
#[[1]]
# a b c d
#1 1 2 1 2
#2 2 3 2 3
#3 3 4 3 4
#
#[[2]]
# a b c d
#1 1 2 1 2
#2 2 3 2 3
#3 3 4 3 4
The Map function pairs up the dataset names as required:
Map(c, names, namesLPC)
#[[1]]
#[1] "at0H0" "at0H0_1"
#
#[[2]]
#[1] "at0H1" "at0H1_1"
The lapply then loops over each part of the above list to mget (multiple-get) each object into a combined list. Like so, for the first part:
unname(mget(c("at0H0","at0H0_1"),inherits=TRUE))
#[[1]]
# a b
#1 1 2
#2 2 3
#3 3 4
#
#[[2]]
# c d
#1 1 2
#2 2 3
#3 3 4
Finally, do.call(cbind, ...) puts this combined list back into a single data.frame:
do.call(cbind, unname(mget(c("at0H0","at0H0_1"),inherits=TRUE)))
# a b c d
#1 1 2 1 2
#2 2 3 2 3
#3 3 4 3 4

I've figured out a way to do it. A few notes: I have 360 data sets that I need to combine, which is why it is i in 1:360. This also names the data sets from an array of the names of the data sets (which is dataNames)
for (i in 1:360){
assign(paste(dataNames[i], sep = ""), cbind(names[[i]], namesLPC[[i]]))
}

Related

Remove column name pattern in multiple dataframes in R

I have >100 dataframes loaded into R with column name prefixes in some but not all columns that I would like to remove. In the below example with 3 dataframes, I would like to remove the pattern x__ in the 3 dataframes but keep all the dataframe names and everything else the same. How could this be done?
df1 <- data.frame(`x__a` = rep(3, 5), `x__b` = seq(1, 5, 1), `x__c` = letters[1:5])
df2 <- data.frame(`d` = rep(5, 5), `x__e` = seq(2, 6, 1), `f` = letters[6:10])
df3 <- data.frame(`x__g` = rep(5, 5), `x__h` = seq(2, 6, 1), `i` = letters[6:10])
You could put the data frames in a list and use an anonymous function with gsub.
lst <- mget(ls(pattern='^df\\d$'))
lapply(lst, \(x) setNames(x, gsub('x__', '', names(x))))
# $df1
# a b c
# 1 3 1 a
# 2 3 2 b
# 3 3 3 c
# 4 3 4 d
# 5 3 5 e
#
# $df2
# d e f
# 1 5 2 f
# 2 5 3 g
# 3 5 4 h
# 4 5 5 i
# 5 5 6 j
#
# $df3
# g h i
# 1 5 2 f
# 2 5 3 g
# 3 5 4 h
# 4 5 5 i
# 5 5 6 j
If you have no use of the list, move the changed dfs back into .GlobalEnv using list2env, but I don't recommend it, since it overwrites.
lapply(lst, \(x) setNames(x, gsub('x__', '', names(x)))) |> list2env(.GlobalEnv)

Creating global environment objects from two lists

I am having an issue assigning variable objects to a list of data frames. For example,
df1 <- data.frame(a = 1, b = 1:10)
df2 <- data.frame(a = 2, b = 1:10)
df3 <- data.frame(a = 3, b = 1:10)
x <- c("a", "b", "c")
y <- list("df1", "df2", "df3")
My goal is to assign each data frame in the y list to the object in x. I can do it long hand.
a <- y[[1]]
But I have many iterations. I have tried the following without any luck
map2(x, y, function(x, y) x <- y)
and
map2(x, y, ~assign(x, y))
Appreciate any help!
We can unlist the list of object names (unlist(y)), get the values with mget in a list, set the names of the list elements with 'x' vector values and use list2env to create objects in the global env (not recommended though)
list2env(setNames(mget(unlist(y)), x), .GlobalEnv)
If we use map2, then we need to get the value of 'y' and also specify the environment to assign the value
map2(x, y, ~ assign(.x, get(.y), envir = .GlobalEnv))
-output
a
# a b
#1 1 1
#2 1 2
#3 1 3
#4 1 4
#5 1 5
#6 1 6
#7 1 7
#8 1 8
#9 1 9
#10 1 10
b
# a b
#1 2 1
#2 2 2
#3 2 3
#4 2 4
#5 2 5
#6 2 6
#7 2 7
#8 2 8
#9 2 9
#10 2 10
Unfortunately, the above options didn't work in my case. I ended up adapting the suggestions made by akrun which worked.
names(y) <- x
list2env(y, envir=.GlobalEnv)

Combine vectors into data frame, using vector name as a column

library(dplyr)
I have a set of vectors:
Sp_A <- c("A",1,2,3,4,5,6,7,8)
Sp_B <- c("B",9,10,11,12,13,14,15,16)
Sp_C <- c("C",17,18,19,20,21,22,23,24)
which I have made into a list of vectors:
list <- ls(pattern = "Sp_")
I want to use this list to loop over each vector in the list and make it into a data frame . I currently do this for one vector using this:
A_df <- select(data.frame(rep(Sp_A[1], each = 4), c(Sp_A[c(2,4,6,8)]), c(Sp_A[c(3,5,7,9)])), name = 1, var1 = 2, var2 = 3)
I have tried to make this operation into a for loop like this:
for(i in list) {
test[i] <- select(A_df <- data.frame(rep(i[1], each = 4),
c(i[c(2,4,6,8)]),
c(i[c(3,5,7,9)]),
name = 1, var1 = 2, var2 = 3))
}
but to no avail.
I have heard that I might be able to use apply() for this sort of thing but I don't know how.
Maybe this:
lapply(list,function(x) data.frame(name=get(x)[1],matrix(get(x)[-1],ncol = 2)))
[[1]]
name X1 X2
1 A 1 5
2 A 2 6
3 A 3 7
4 A 4 8
[[2]]
name X1 X2
1 B 9 13
2 B 10 14
3 B 11 15
4 B 12 16
[[3]]
name X1 X2
1 C 17 21
2 C 18 22
3 C 19 23
4 C 20 24
Or a simple for loop to assign the dataframes to objects:
for (x in 1:length(list)){
assign(paste0("test",x),data.frame(name=get(list[x])[1],matrix(get(list[x])[-1],ncol = 2)))
}

LIst of lists in R into a data.frame - inconsistent variable names

I have a list of lists and I want to convert it into a dataframe. The challenge is that there are missing variables names in lists (not NA's but the variable is missing completely).
To illustrate on example: from
my_list <- list()
my_list[[1]] <- list(a = 1, b = 2, c = 3)
my_list[[2]] <- list(a = 4, c = 6)
I would like to get
a b c
[1,] 1 2 3
[2,] 4 NA 6
Another option is
library(reshape2)
as.data.frame(acast(melt(my_list), L1~L2, value.var='value'))
# a b c
#1 1 2 3
#2 4 NA 6
Or as #David Arenburg suggested a wrapper for melt/dcast would be recast
recast(my_list, L1 ~ L2, value.var = 'value')[, -1]
# a b c
#1 1 2 3
#2 4 NA 6
You can use the bind_rows function from the dplyr package :
my_list <- list()
my_list[[1]] <- list(a = 1, b = 2, c = 3)
my_list[[2]] <- list(a = 4, c = 6)
dplyr::bind_rows(lapply(my_list, as.data.frame))
This outputs:
Source: local data frame [2 x 3]
a b c
1 1 2 3
2 4 NA 6
Another answer, this requires to change the class of the arguments to data.frames:
library(plyr)
lista <- list(a=1, b=2, c =3)
listb <- list(a=4, c=6)
lista <- as.data.frame(lista)
listb <- as.data.frame(listb)
my_list <- list(lista, listb)
my_list <- do.call(rbind.fill, my_list)
my_list
a b c
1 1 2 3
2 4 NA 6

In R, how can I access the first element of each level of a factor?

I have a data frame like this:
n = c(2, 2, 3, 3, 4, 4)
n <- as.factor(n)
s = c("a", "b", "c", "d", "e", "f")
df = data.frame(n, s)
df
n s
1 2 a
2 2 b
3 3 c
4 3 d
5 4 e
6 4 f
and I want to access the first element of each level of my factor (and have in this example a vector containing a, c, e).
It is possible to reach the first element of one level, with
df$s[df$n == 2][1]
but it does not work for all levels:
df$s[df$n == levels(n)]
[1] a f
How would you do that?
And to go further, I’d like to modify my data frame to see which is the first element for each level at every occurrence. In my example, a new column should be:
n s rep firstelement
1 2 a a a
2 2 b c a
3 3 c e c
4 3 d a c
5 4 e c e
6 4 f e e
Edit. The first part of my answer addresses the original question, i.e. before "And to go further" (which was added by OP in an edit).
Another possibility, using duplicated. From ?duplicated: "duplicated() determines which elements of a vector or data frame are duplicates of elements with smaller subscripts."
Here we use !, the logical negation (NOT), to select not duplicated elements of 'n', i.e. first elements of each level of 'n'.
df[!duplicated(df$n), ]
# n s
# 1 2 a
# 3 3 c
# 5 4 e
Update Didn't see your "And to go further" edit until now. My first suggestion would definitely be to use ave, as already proposed by #thelatemail and #sparrow. But just to dig around in the R toolbox and show you an alternative, here's a dplyr way:
Group the data by n, use the mutate function to create a new variable 'first', with the value 'first element of s' (s[1]),
library(dplyr)
df %.%
group_by(n) %.%
mutate(
first = s[1])
# n s first
# 1 2 a a
# 2 2 b a
# 3 3 c c
# 4 3 d c
# 5 4 e e
# 6 4 f e
Or go all in with dplyr convenience functions and use first instead of [1]:
df %.%
group_by(n) %.%
mutate(
first = first(s))
A dplyr solution for your original question would be to use summarise:
df %.%
group_by(n) %.%
summarise(
first = first(s))
# n first
# 1 2 a
# 2 3 c
# 3 4 e
Here is an approach using match:
df$s[match(levels(n), df$n)]
EDIT: Maybe this looks a bit confusing ...
To get a column which lists the first elements you could use match twice (but with x and table arguments swapped):
df$firstelement <- df$s[match(levels(n), df$n)[match(df$n, levels(n))]]
df$firstelement
# [1] a a c c e e
# Levels: a b c d e f
Lets look at this in detail:
## this returns the first matching elements
match(levels(n), df$n)
# [1] 1 3 5
## when we swap the x and table argument in match we get the level index
## for each df$n (the duplicated indices are important)
match(df$n, levels(n))
# [1] 1 1 2 2 3 3
## results in
c(1, 3, 5)[c(1, 1, 2, 2, 3, 3)]
# [1] 1 1 3 3 5 5
df$s[c(1, 1, 3, 3, 5, 5)]
# [1] a a c c e e
# Levels: a b c d e f
the function ave is useful in these cases:
df$firstelement = ave(df$s, df$n, FUN = function(x) x[1])
df
n s firstelement
1 2 a a
2 2 b a
3 3 c c
4 3 d c
5 4 e e
6 4 f e
In this case I prefer plyr package, it gives further freedom to manipulate the data.
library(plyr)
ddply(df,.(n),function(subdf){return(subdf[1,])})
n s
1 2 a
2 3 c
3 4 e
You could also use data.table
library(data.table)
dt = as.data.table(df)
dt[, list(firstelement = s[1]), by=n]
which would get you:
n firstelement
1: 2 a
2: 3 c
3: 4 e
The by=n bit groups everything by each value of n so s[1] is getting the first element of each of those groups.
To get this as an extra column you could do:
dt[, newcol := s[1], by=n]
dt
# n s newcol
#1: 2 a a
#2: 2 b a
#3: 3 c c
#4: 3 d c
#5: 4 e e
#6: 4 f e
So this just takes the value of s from the first row of each group and assigns it to a new column.
df$s[sapply(levels(n), function(particular.level) { which(df$n == particular.level)[1]})]
I believe your problem is that you are comparing two vectors df$n is a vector and levels(n) is a vector. vector == vector only happens to work for you since df$n is a multiple length of levels(n)
Surprised not to see this classic in the answer stream yet.
> do.call(rbind, lapply(split(df, df$n), function(x) x[1,]))
## n s
## 2 2 a
## 3 3 c
## 4 4 e

Resources