I am not sure if this question is too basic but as I haven't found an answer despite searching google for quite some time I have to ask here..
Suppose I want to create a list out of data frames (df1 and df2), how can I use the name of the data frame as the list "index"(?) instead of numbers? I.e., how do I get [[df1]] instead of [[1]] and [[df2]] instead of [[2]]?
list(structure(list(a = 1:10, b = 1:10), .Names = c("a", "b"), row.names = c(NA,
-10L), class = "data.frame"), structure(list(b = 1:10, a = 1:10), .Names = c("b",
"a"), row.names = c(NA, -10L), class = "data.frame"))
OK, entirely different way to ask this question to hopefully make things clearer ;)
I have three data frames
weguihl <- structure(list(a = 1:3, b = 1:3), .Names = c("a", "b"), row.names = c(NA, -3L), class = "data.frame")
raeg <- structure(list(b = 1:3, a = 1:3), .Names = c("b", "a"), row.names = c(NA, -3L), class = "data.frame")
awezilf <- structure(list(a = 1:3, b = 1:3), .Names = c("a", "b"), row.names = c(NA, -3L), class = "data.frame")
I want to create a list out of them..
li <- list(weguihl, raeg, awezilf)
But now I have the problem that - without remembering the order of the data frames - I do not know which data frame is which in the list..
> li
[[1]]
a b
1 1 1
2 2 2
3 3 3
[[2]]
b a
1 1 1
2 2 2
3 3 3
[[3]]
a b
1 1 1
2 2 2
3 3 3
Thus I'd prefer this output
> li
[[weguihl]]
a b
1 1 1
2 2 2
3 3 3
[[raeg]]
b a
1 1 1
2 2 2
3 3 3
[[awezilf]]
a b
1 1 1
2 2 2
3 3 3
How do I get there?
You could potentially achieving this with mget on a clean global environment.
Something like
Clean the global environment
rm(list = ls())
You data frames
weguihl <- structure(list(a = 1:10, b = 1:10), .Names = c("a", "b"), row.names = c(NA, -10L), class = "data.frame")
raeg <- structure(list(b = 1:10, a = 1:10), .Names = c("b", "a"), row.names = c(NA, -10L), class = "data.frame")
awezilf <- structure(list(a = 1:10, b = 1:10), .Names = c("a", "b"), row.names = c(NA, -10L), class = "data.frame")
Running mget which will return a list of data frames by default
li <- mget(ls(), .GlobalEnv)
li
# $awezilf
# a b
# 1 1 1
# 2 2 2
# 3 3 3
#
# $raeg
# b a
# 1 1 1
# 2 2 2
# 3 3 3
#
# $weguihl
# a b
# 1 1 1
# 2 2 2
# 3 3 3
Related
i have a data and i want to see if my variables they all have unique value in specefic row
let's say i want to analyze row D
my data
Name F S T
A 1 2 3
B 2 3 4
C 3 4 5
D 4 5 6
> TRUE (because all the three variables have unique value)
Second example
Name F S T
A 1 2 3
B 2 3 4
C 3 4 5
D 4 5 4
>False (because F and T have the same value in row D )
In base R do
f1 <- function(dat, ind) {
tmp <- unlist(dat[ind, -1])
length(unique(tmp)) == length(tmp)
}
-testing
> f1(df, 4)
[1] TRUE
> f1(df1, 4)
[1] FALSE
data
df <- structure(list(Name = c("A", "B", "C", "D"), F = 1:4, S = 2:5,
T = 3:6), class = "data.frame", row.names = c(NA, -4L))
df1 <- structure(list(Name = c("A", "B", "C", "D"), F = 1:4, S = 2:5,
T = c(3L, 4L, 5L, 4L)), class = "data.frame", row.names = c(NA,
-4L))
You can use dplyr for this:
df %>%
summarize_at(c(2:ncol(.)), n_distinct) %>%
summarize(if_all(.fns = ~ .x == nrow(df)))
I want to ask how do I merge this two data frame?
df1:
Name Type Price
A 1 NA
B 2 2.5
C 3 2.0
df2:
Name Type Price
A 1 1.5
D 2 2.5
E 3 2.0
As you can see from both df, they have same column names and one row with the same value in "Name" which is A but df1 doesn't have the price whereas df2 has. I want to achieve this output such that they merge if the value in "Name" is the same
Name Type Price
A 1 1.5
B 2 2.5
C 3 2.0
D 2 2.5
E 3 2.0
We could do a full_join on df1 and df2 by Name and using coalesce on Type and Price get the first non-NA value from those columns.
library(dplyr)
full_join(df1, df2, by = 'Name') %>%
mutate(Type = coalesce(Type.x, Type.y),
Price = coalesce(Price.x, Price.y)) %>%
select(names(df1))
# Name Type Price
#1 A 1 1.5
#2 B 2 2.5
#3 C 3 2.0
#4 D 2 2.5
#5 E 3 2.0
And similar in base R :
transform(merge(df1, df2, by = 'Name', all = TRUE),
Price = ifelse(is.na(Price.x), Price.y, Price.x),
Type = ifelse(is.na(Type.x), Type.y, Type.x))[names(df1)]
data
df1 <- structure(list(Name = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), Type = 1:3, Price = c(NA, 2.5, 2)),
class = "data.frame", row.names = c(NA, -3L))
df2 <- structure(list(Name = structure(1:3, .Label = c("A", "D", "E"
), class = "factor"), Type = 1:3, Price = c(1.5, 2.5, 2)),
class = "data.frame", row.names = c(NA, -3L))
Seems like you want to rbind the data frames together, then remove rows with NA values for Price, and order by Name.
library(data.table)
setDT(rbind(df1, df2))[!is.na(Price)][order(Name)]
# Name Type Price
# 1: A 1 1.5
# 2: B 2 2.5
# 3: C 3 2.0
# 4: D 2 2.5
# 5: E 3 2.0
Here is a base R solution using merge + ocmplete.cases
dfout <- subset(u <- merge(df1,df2,all= TRUE),complete.cases(u))
which yields
> dfout
Name Type Price
1 A 1 1.5
3 B 2 2.5
4 C 3 2.0
5 D 2 2.5
6 E 3 2.0
DATA
df1 <- structure(list(Name = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), Type = 1:3, Price = c(NA, 2.5, 2)),
class = "data.frame", row.names = c(NA, -3L))
df2 <- structure(list(Name = structure(1:3, .Label = c("A", "D", "E"
), class = "factor"), Type = 1:3, Price = c(1.5, 2.5, 2)),
class = "data.frame", row.names = c(NA, -3L))
This question already has answers here:
Changing Column Names in a List of Data Frames in R
(6 answers)
Closed 4 years ago.
I have the following list with multiple dataframes .
> dput(dfs)
structure(list(a = structure(list(x = 1:4, a = c(0.114304427057505,
0.202305722748861, 0.247671527322382, 0.897279736353084)), .Names = c("x",
"a"), row.names = c(NA, -4L), class = "data.frame"), b = structure(list(
x = 1:3, b = c(0.982652948237956, 0.694535500137135, 0.0617770322132856
)), .Names = c("x", "b"), row.names = c(NA, -3L), class = "data.frame"),
c = structure(list(x = 1:2, c = c(0.792271690675989, 0.997932326048613
)), .Names = c("x", "c"), row.names = c(NA, -2L), class = "data.frame")), .Names = c("a",
"b", "c"))
here i want change the first column name of each dataframe.
> dfs
$a
x a
1 1 0.1143044
2 2 0.2023057
3 3 0.2476715
4 4 0.8972797
$b
x b
1 1 0.98265295
2 2 0.69453550
3 3 0.06177703
$c
x c
1 1 0.7922717
2 2 0.9979323
I am using the following function
> lapply(dfs,function(x){ names(x)[1] <- 'sec';x})
$a
sec a
1 1 0.1143044
2 2 0.2023057
3 3 0.2476715
4 4 0.8972797
$b
sec b
1 1 0.98265295
2 2 0.69453550
3 3 0.06177703
$c
sec c
1 1 0.7922717
2 2 0.9979323
It's works but when i recall the original list ,the column names are not change.
How to assign to original list?
Thank you.
You have to assign the result of lapply to a variable, like this
dfs <- lapply(dfs,function(x){
names(x)[1] <- 'sec'
return(x)
})
I have this df1:
A B C
1 2 3
5 7 9
where A B C are columns names.
I have another df2 with one column:
A
1
2
3
4
I would like to append df2 for each column of df1, creating this final dataframe:
A B C
1 2 3
5 7 9
1 1 1
2 2 2
3 3 3
4 4 4
is it possible to do it?
data.frame(sapply(df1, c, unlist(df2)), row.names = NULL)
# A B C
#1 1 2 3
#2 5 7 9
#3 1 1 1
#4 2 2 2
#5 3 3 3
#6 4 4 4
DATA
df1 = structure(list(A = c(1L, 5L), B = c(2L, 7L), C = c(3L, 9L)), .Names = c("A",
"B", "C"), class = "data.frame", row.names = c(NA, -2L))
df2 = structure(list(A = 1:4), .Names = "A", class = "data.frame", row.names = c(NA,
-4L))
We can replicate df2 for the number of columns of df1, unname it, then rbind it.
rbind(df1, unname(rep(df2, ncol(df1))))
# A B C
# 1 1 2 3
# 2 5 7 9
# 3 1 1 1
# 4 2 2 2
# 5 3 3 3
# 6 4 4 4
Data:
df1 <- structure(list(A = c(1L, 5L), B = c(2L, 7L), C = c(3L, 9L)), .Names = c("A",
"B", "C"), class = "data.frame", row.names = c(NA, -2L))
df2 <- structure(list(A = 1:4), .Names = "A", row.names = c(NA, -4L), class = "data.frame")
We can use base R methods
rbind(df1, setNames(as.data.frame(do.call(cbind, rep(list(df2$A), 3))), names(df1)))
# A B C
#1 1 2 3
#2 5 7 9
#3 1 1 1
#4 2 2 2
#5 3 3 3
#6 4 4 4
data
df1 <- structure(list(A = c(1L, 5L), B = c(2L, 7L), C = c(3L, 9L)), .Names = c("A",
"B", "C"), class = "data.frame", row.names = c(NA, -2L))
df2 <- structure(list(A = 1:4), .Names = "A", class = "data.frame",
row.names = c(NA, -4L))
Here is a base R method with rbind, rep, and setNames:
rbind(dat, setNames(data.frame(rep(dat1, ncol(dat))), names(dat)))
A B C
1 1 2 3
2 5 7 9
3 1 1 1
4 2 2 2
5 3 3 3
6 4 4 4
Edit: turns outdata.frame isn't necessary:
rbind(dat, setNames(rep(dat1, ncol(dat)), names(dat)))
will work.
data
dat <-
structure(list(A = c(1L, 5L), B = c(2L, 7L), C = c(3L, 9L)), .Names = c("A",
"B", "C"), class = "data.frame", row.names = c(NA, -2L))
dat1 <-
structure(list(A = 1:4), .Names = "A", row.names = c(NA, -4L),
class = "data.frame")
I just love R, here is yet another Base R solution but with mapply:
data.frame(mapply(c, df1, df2))
Result:
A B C
1 1 2 3
2 5 7 9
3 1 1 1
4 2 2 2
5 3 3 3
6 4 4 4
Note:
No need to deal with colnames like almost all the other solutions... The key to why this works is that "mapply calls FUN for the values of ... [each element]
(re-cycled to the length of the longest...[element]" (See ?mapply). In other words, df2$A is recycled to however many columns df1 has.
Data:
df1 = structure(list(A = c(1L, 5L), B = c(2L, 7L), C = c(3L, 9L)), .Names = c("A",
"B", "C"), class = "data.frame", row.names = c(NA, -2L))
df2 = structure(list(A = 1:4), .Names = "A", row.names = c(NA, -4L), class = "data.frame")
Data:
df1 <- data.frame(A=c(1,5),
B=c(2,7),
C=c(3,9))
df2 <- data.frame(A=c(1,2,3,4))
Solution:
df2 <- matrix(rep(df2$A, ncol(df1)), ncol=ncol(df1))
colnames(df2) <- colnames(df1)
rbind(df1,df2)
Result:
A B C
1 1 2 3
2 5 7 9
3 1 1 1
4 2 2 2
5 3 3 3
6 4 4 4
A solution from purrr, which uses map_dfc to loop through all columns in df1 to combine all the elements with df2$A.
library(purrr)
map_dfc(df1, ~c(., df2$A))
# A tibble: 6 x 3
A B C
<int> <int> <int>
1 1 2 3
2 5 7 9
3 1 1 1
4 2 2 2
5 3 3 3
6 4 4 4
Data
df1 <- structure(list(A = c(1L, 5L), B = c(2L, 7L), C = c(3L, 9L)), .Names = c("A",
"B", "C"), class = "data.frame", row.names = c(NA, -2L))
df2 <- structure(list(A = 1:4), .Names = "A", class = "data.frame",
row.names = c(NA, -4L))
By analogy with #useR's excellent Base R answer, here's a tidyverse solution:
library(purrr)
map2_df(df1, df2, c)
A B C
1 1 2 3
2 5 7 9
3 1 1 1
4 2 2 2
5 3 3 3
6 4 4 4
Here are a few other (less desirable) options from when I first answered this question.
library(dplyr)
bind_rows(df1, df2 %>% mutate(B=A, C=A))
Or, if we want to dynamically get the number of columns and their names from df1:
bind_rows(df1,
df2[,rep(1,ncol(df1))] %>% setNames(names(df1)))
And one more Base R method:
rbind(df1, setNames(df2[,rep(1,ncol(df1))], names(df1)))
For the sake of completeness, here is data.table approach which doesn't require to handle column names:
library(data.table)
setDT(df1)[, lapply(.SD, c, df2$A)]
A B C
1: 1 2 3
2: 5 7 9
3: 1 1 1
4: 2 2 2
5: 3 3 3
6: 4 4 4
Note that the OP has described df2 to consist only of one column.
There is also a base R version of this approach:
data.frame(lapply(df1, c, df2$A))
A B C
1 1 2 3
2 5 7 9
3 1 1 1
4 2 2 2
5 3 3 3
6 4 4 4
This is similar to d.b's approach but doesn't required to deal with column names.
I have a list of some length(let's say 1000). Each element of the list is another list of length = 2. Each element of the new list is a data.table. The second element of each list might be an empty data.table.
I need to rbind() all the data.frames that are in the first position of the list. I am currently doing the following:
DT1 = data.table()
DT2 = data.table()
for (i in 1:length(myList)){
DT1 = rbind(DT1, myList[[i]][[1]]
DT2 = rbind(DT2, myList[[i]][[2]]
}
This works, but it is too slow. Is there a way I can avoid the for-loop?
Thank you in advance!
data table has a dedicated fast function: rbindlist
Cf: http://www.inside-r.org/packages/cran/data.table/docs/rbindlist
Edited:
Here is an example of code
library(data.table)
srcList=list(list(DT1=data.table(X=0),DT2=NULL),list(DT1=data.table(X=2),data.table(Y=3)))
# first have a list for all DT1s
DT1.list= lapply(srcList, FUN=function(el){el$DT1})
rbindlist(DT1.list)
X
1: 0
2: 2
Do this:
do.call("rbind", lapply(df.list, "[[", 1)) # for first list element
# x y
# 1 1 10
# 2 2 20
# 3 3 30
# 4 4 40
# 5 5 50
# 6 6 60
do.call("rbind", lapply(df.list, "[[", 2)) # for second list element
# x y
# 1 1 30
# 2 2 40
# 3 3 50
# 4 4 70
# 5 5 80
# 6 6 90
DATA
df.list=list(list(structure(list(x = 1:3, y = c(10, 20, 30)), .Names = c("x",
"y"), row.names = c(NA, -3L), class = "data.frame"), structure(list(
x = 1:3, y = c(30, 40, 50)), .Names = c("x", "y"), row.names = c(NA,
-3L), class = "data.frame")), list(structure(list(x = 4:6, y = c(40,
50, 60)), .Names = c("x", "y"), row.names = c(NA, -3L), class = "data.frame"),
structure(list(x = 4:6, y = c(70, 80, 90)), .Names = c("x",
"y"), row.names = c(NA, -3L), class = "data.frame")))
# df.list
# [[1]]
# [[1]][[1]]
# x y
# 1 1 10
# 2 2 20
# 3 3 30
# [[1]][[2]]
# x y
# 1 1 30
# 2 2 40
# 3 3 50
# [[2]]
# [[2]][[1]]
# x y
# 1 4 40
# 2 5 50
# 3 6 60
# [[2]][[2]]
# x y
# 1 4 70
# 2 5 80
# 3 6 90