How to append group row into dataframe - r

I have this df1:
A B C
1 2 3
5 7 9
where A B C are columns names.
I have another df2 with one column:
A
1
2
3
4
I would like to append df2 for each column of df1, creating this final dataframe:
A B C
1 2 3
5 7 9
1 1 1
2 2 2
3 3 3
4 4 4
is it possible to do it?

data.frame(sapply(df1, c, unlist(df2)), row.names = NULL)
# A B C
#1 1 2 3
#2 5 7 9
#3 1 1 1
#4 2 2 2
#5 3 3 3
#6 4 4 4
DATA
df1 = structure(list(A = c(1L, 5L), B = c(2L, 7L), C = c(3L, 9L)), .Names = c("A",
"B", "C"), class = "data.frame", row.names = c(NA, -2L))
df2 = structure(list(A = 1:4), .Names = "A", class = "data.frame", row.names = c(NA,
-4L))

We can replicate df2 for the number of columns of df1, unname it, then rbind it.
rbind(df1, unname(rep(df2, ncol(df1))))
# A B C
# 1 1 2 3
# 2 5 7 9
# 3 1 1 1
# 4 2 2 2
# 5 3 3 3
# 6 4 4 4
Data:
df1 <- structure(list(A = c(1L, 5L), B = c(2L, 7L), C = c(3L, 9L)), .Names = c("A",
"B", "C"), class = "data.frame", row.names = c(NA, -2L))
df2 <- structure(list(A = 1:4), .Names = "A", row.names = c(NA, -4L), class = "data.frame")

We can use base R methods
rbind(df1, setNames(as.data.frame(do.call(cbind, rep(list(df2$A), 3))), names(df1)))
# A B C
#1 1 2 3
#2 5 7 9
#3 1 1 1
#4 2 2 2
#5 3 3 3
#6 4 4 4
data
df1 <- structure(list(A = c(1L, 5L), B = c(2L, 7L), C = c(3L, 9L)), .Names = c("A",
"B", "C"), class = "data.frame", row.names = c(NA, -2L))
df2 <- structure(list(A = 1:4), .Names = "A", class = "data.frame",
row.names = c(NA, -4L))

Here is a base R method with rbind, rep, and setNames:
rbind(dat, setNames(data.frame(rep(dat1, ncol(dat))), names(dat)))
A B C
1 1 2 3
2 5 7 9
3 1 1 1
4 2 2 2
5 3 3 3
6 4 4 4
Edit: turns outdata.frame isn't necessary:
rbind(dat, setNames(rep(dat1, ncol(dat)), names(dat)))
will work.
data
dat <-
structure(list(A = c(1L, 5L), B = c(2L, 7L), C = c(3L, 9L)), .Names = c("A",
"B", "C"), class = "data.frame", row.names = c(NA, -2L))
dat1 <-
structure(list(A = 1:4), .Names = "A", row.names = c(NA, -4L),
class = "data.frame")

I just love R, here is yet another Base R solution but with mapply:
data.frame(mapply(c, df1, df2))
Result:
A B C
1 1 2 3
2 5 7 9
3 1 1 1
4 2 2 2
5 3 3 3
6 4 4 4
Note:
No need to deal with colnames like almost all the other solutions... The key to why this works is that "mapply calls FUN for the values of ... [each element]
(re-cycled to the length of the longest...[element]" (See ?mapply). In other words, df2$A is recycled to however many columns df1 has.
Data:
df1 = structure(list(A = c(1L, 5L), B = c(2L, 7L), C = c(3L, 9L)), .Names = c("A",
"B", "C"), class = "data.frame", row.names = c(NA, -2L))
df2 = structure(list(A = 1:4), .Names = "A", row.names = c(NA, -4L), class = "data.frame")

Data:
df1 <- data.frame(A=c(1,5),
B=c(2,7),
C=c(3,9))
df2 <- data.frame(A=c(1,2,3,4))
Solution:
df2 <- matrix(rep(df2$A, ncol(df1)), ncol=ncol(df1))
colnames(df2) <- colnames(df1)
rbind(df1,df2)
Result:
A B C
1 1 2 3
2 5 7 9
3 1 1 1
4 2 2 2
5 3 3 3
6 4 4 4

A solution from purrr, which uses map_dfc to loop through all columns in df1 to combine all the elements with df2$A.
library(purrr)
map_dfc(df1, ~c(., df2$A))
# A tibble: 6 x 3
A B C
<int> <int> <int>
1 1 2 3
2 5 7 9
3 1 1 1
4 2 2 2
5 3 3 3
6 4 4 4
Data
df1 <- structure(list(A = c(1L, 5L), B = c(2L, 7L), C = c(3L, 9L)), .Names = c("A",
"B", "C"), class = "data.frame", row.names = c(NA, -2L))
df2 <- structure(list(A = 1:4), .Names = "A", class = "data.frame",
row.names = c(NA, -4L))

By analogy with #useR's excellent Base R answer, here's a tidyverse solution:
library(purrr)
map2_df(df1, df2, c)
A B C
1 1 2 3
2 5 7 9
3 1 1 1
4 2 2 2
5 3 3 3
6 4 4 4
Here are a few other (less desirable) options from when I first answered this question.
library(dplyr)
bind_rows(df1, df2 %>% mutate(B=A, C=A))
Or, if we want to dynamically get the number of columns and their names from df1:
bind_rows(df1,
df2[,rep(1,ncol(df1))] %>% setNames(names(df1)))
And one more Base R method:
rbind(df1, setNames(df2[,rep(1,ncol(df1))], names(df1)))

For the sake of completeness, here is data.table approach which doesn't require to handle column names:
library(data.table)
setDT(df1)[, lapply(.SD, c, df2$A)]
A B C
1: 1 2 3
2: 5 7 9
3: 1 1 1
4: 2 2 2
5: 3 3 3
6: 4 4 4
Note that the OP has described df2 to consist only of one column.
There is also a base R version of this approach:
data.frame(lapply(df1, c, df2$A))
A B C
1 1 2 3
2 5 7 9
3 1 1 1
4 2 2 2
5 3 3 3
6 4 4 4
This is similar to d.b's approach but doesn't required to deal with column names.

Related

How to remove rows if values from a specified column in data set 1 does not match the values of the same column from data set 2 using dplyr

I have 2 data sets, both include ID columns with the same IDs. I have already removed rows from the first data set. For the second data set, I would like to remove any rows associated with IDs that do not match the first data set by using dplyr.
Meaning whatever is DF2 must be in DF1, if it is not then it must be removed from DF2.
For example:
DF1
ID X Y Z
1 1 1 1
2 2 2 2
3 3 3 3
5 5 5 5
6 6 6 6
DF2
ID A B C
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
DF2 once rows have been removed
ID A B C
1 1 1 1
2 2 2 2
3 3 3 3
5 5 5 5
6 6 6 6
I used anti_join() which shows me the difference in rows but I cannot figure out how to remove any rows associated with IDs that do not match the first data set by using dplyr.
Try with paste
i1 <- do.call(paste, DF2) %in% do.call(paste, DF1)
# if it is only to compare the 'ID' columns
i1 <- DF2$ID %in% DF1$ID
DF3 <- DF2[i1,]
DF3
ID A B C
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 5 5 5 5
5 6 6 6 6
DF4 <- DF2[!i1,]
DF4
ID A B C
4 4 4 4 4
7 7 7 7 7
data
DF1 <- structure(list(ID = c(1L, 2L, 3L, 5L, 6L), X = c(1L, 2L, 3L,
5L, 6L), Y = c(1L, 2L, 3L, 5L, 6L), Z = c(1L, 2L, 3L, 5L, 6L)), class = "data.frame", row.names = c(NA,
-5L))
DF2 <- structure(list(ID = 1:7, A = 1:7, B = 1:7, C = 1:7), class = "data.frame", row.names = c(NA,
-7L))
# Load package
library(dplyr)
# Load dataframes
df1 <- data.frame(
ID = 1:6,
X = 1:6,
Y = 1:6,
Z = 1:6
)
df2 <- data.frame(
ID = 1:7,
X = 1:7,
Y = 1:7,
Z = 1:7
)
# Include all rows in df1
df1 %>%
left_join(df2)
Joining, by = c("ID", "X", "Y", "Z")
ID X Y Z
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6

How to replace the values in a binary matrix with values from a dataframe?

The matrix I have looks something like this:
Plot A B C
1 1 0 0
2 1 0 1
3 1 1 0
And I have a dataframe that looks like this
A 5
B 4
C 2
What I would like to do is replace the "1" values in the matrix with the corresponding values in the dataframe, like this:
Plot A B C
1 5 0 0
2 5 0 2
3 5 4 0
Any suggestions on how to do this in R? Thank you!
An option with tidyverse
library(dplyr)
df1 %>%
mutate(across(all_of(df2$col1),
~ replace(.x, .x== 1, df2$col2[match(cur_column(), df2$col1)])))
-output
Plot A B C
1 1 5 0 0
2 2 5 0 2
3 3 5 4 0
data
df1 <- structure(list(Plot = 1:3, A = c(1L, 1L, 1L), B = c(0L, 0L, 1L
), C = c(0L, 1L, 0L)), class = "data.frame", row.names = c(NA,
-3L))
df2 <- structure(list(col1 = c("A", "B", "C"), col2 = c(5, 4, 2)),
class = "data.frame", row.names = c(NA,
-3L))

Map a function to two data frames of unequal lengths

For each row in df1 I would like to execute mult 10 times, once for each year in df2.
One option I can think of is to repeat df1 multiple times and join it to df2. But my actual data are much larger (~20k sections, 15 areas and 100 years), so I am looking for a more efficient way to do this.
# df1
section area a b c
1 1 1 0.1208916 0.7235306 0.7652636
2 2 1 0.8265642 0.2939602 0.6491496
3 1 2 0.9101611 0.7363248 0.1509295
4 2 2 0.8807047 0.5473221 0.6748055
5 1 3 0.2343558 0.2044689 0.9647333
6 2 3 0.4112479 0.9523639 0.1533197
----------
# df2
year d
1 1 0.7357432
2 2 0.4591575
3 3 0.3654561
4 4 0.1996439
5 5 0.2086226
6 6 0.5628826
7 7 0.4772953
8 8 0.8474007
9 9 0.8861693
10 10 0.6694851
mult <- function(a, b, c, d) {a * b * c * d}
The desired output would look something like this
section area year e
1 1 1 1 results of mult()
2 2 1 1 results of mult()
3 1 2 1 results of mult()
4 2 2 1 results of mult()
5 1 3 1 results of mult()
6 2 3 1 results of mult()
7 1 1 2 results of mult()
8 2 1 2 results of mult()
...
dput(df1)
structure(list(section = c(1L, 2L, 1L, 2L, 1L, 2L), area = c(1L,
1L, 2L, 2L, 3L, 3L), a = c(0.12089157756418, 0.826564211165532,
0.91016107192263, 0.880704707000405, 0.234355789143592, 0.411247851792723
), b = c(0.72353063733317, 0.293960151728243, 0.736324765253812,
0.547322086291388, 0.204468948533759, 0.952363904565573), c = c(0.765263637062162,
0.649149592733011, 0.150929539464414, 0.674805536167696, 0.964733332861215,
0.15331974090077)), out.attrs = list(dim = structure(2:3, .Names = c("section",
"area")), dimnames = list(section = c("section=1", "section=2"
), area = c("area=1", "area=2", "area=3"))), class = "data.frame", row.names = c(NA,
-6L))
dput(df2)
structure(list(year = 1:10, d = c(0.735743158031255, 0.459157506935298,
0.365456136409193, 0.199643932981417, 0.208622586680576, 0.562882597092539,
0.477295308141038, 0.847400720929727, 0.886169332079589, 0.669485098216683
)), class = "data.frame", row.names = c(NA, -10L))
Edit: full sized toy dataset
library(dplyr)
df1 <- expand.grid(section = 1:20000,
area = 1:15) %>%
mutate(a = runif(300000),
b = runif(300000),
c = runif(300000))
df2 <- data.frame(year = 1:100,
d = runif(100))
You can use crossing to create combinations of df1 and df2 and apply mult to them.
tidyr::crossing(df1, df2) %>% dplyr::mutate(e = mult(a, b, c, d))

List of data frames with names instead of numbers?

I am not sure if this question is too basic but as I haven't found an answer despite searching google for quite some time I have to ask here..
Suppose I want to create a list out of data frames (df1 and df2), how can I use the name of the data frame as the list "index"(?) instead of numbers? I.e., how do I get [[df1]] instead of [[1]] and [[df2]] instead of [[2]]?
list(structure(list(a = 1:10, b = 1:10), .Names = c("a", "b"), row.names = c(NA,
-10L), class = "data.frame"), structure(list(b = 1:10, a = 1:10), .Names = c("b",
"a"), row.names = c(NA, -10L), class = "data.frame"))
OK, entirely different way to ask this question to hopefully make things clearer ;)
I have three data frames
weguihl <- structure(list(a = 1:3, b = 1:3), .Names = c("a", "b"), row.names = c(NA, -3L), class = "data.frame")
raeg <- structure(list(b = 1:3, a = 1:3), .Names = c("b", "a"), row.names = c(NA, -3L), class = "data.frame")
awezilf <- structure(list(a = 1:3, b = 1:3), .Names = c("a", "b"), row.names = c(NA, -3L), class = "data.frame")
I want to create a list out of them..
li <- list(weguihl, raeg, awezilf)
But now I have the problem that - without remembering the order of the data frames - I do not know which data frame is which in the list..
> li
[[1]]
a b
1 1 1
2 2 2
3 3 3
[[2]]
b a
1 1 1
2 2 2
3 3 3
[[3]]
a b
1 1 1
2 2 2
3 3 3
Thus I'd prefer this output
> li
[[weguihl]]
a b
1 1 1
2 2 2
3 3 3
[[raeg]]
b a
1 1 1
2 2 2
3 3 3
[[awezilf]]
a b
1 1 1
2 2 2
3 3 3
How do I get there?
You could potentially achieving this with mget on a clean global environment.
Something like
Clean the global environment
rm(list = ls())
You data frames
weguihl <- structure(list(a = 1:10, b = 1:10), .Names = c("a", "b"), row.names = c(NA, -10L), class = "data.frame")
raeg <- structure(list(b = 1:10, a = 1:10), .Names = c("b", "a"), row.names = c(NA, -10L), class = "data.frame")
awezilf <- structure(list(a = 1:10, b = 1:10), .Names = c("a", "b"), row.names = c(NA, -10L), class = "data.frame")
Running mget which will return a list of data frames by default
li <- mget(ls(), .GlobalEnv)
li
# $awezilf
# a b
# 1 1 1
# 2 2 2
# 3 3 3
#
# $raeg
# b a
# 1 1 1
# 2 2 2
# 3 3 3
#
# $weguihl
# a b
# 1 1 1
# 2 2 2
# 3 3 3

R: Subsetting a data.table with repeated column names with numerical positions

I have a data.table that looks like this
> dput(DT)
A B C A B C D
1: 1 2 3 3 5 6 7
2: 2 1 3 2 1 3 4
Here's the dput
DT <- structure(list(A = 1:2, B = c(2L, 1L), C = c(3L, 3L), A = c(3L,
2L), B = c(5L, 1L), C = c(6L, 3L), D = c(7L, 4L)), .Names = c("A",
"B", "C", "A", "B", "C", "D"), row.names = c(NA, -2L), class = c("data.table",
"data.frame"))
Basically, I want to subset them according to their headers. So for header "B", I would do this:
subset(DT,,grep(unique(names(DT))[2],names(DT)))
B B
1: 2 2
2: 1 1
As you can see, the values are wrong as the second column is simply a repeat of the first. I want to get this instead:
B B
1: 2 5
2: 1 1
Can anyone help me please?
The following alternatives work for me:
pos <- grep("B", names(DT))
DT[, ..pos]
# B B
# 1: 2 5
# 2: 1 1
DT[, .SD, .SDcols = patterns("B")]
# B B
# 1: 2 5
# 2: 1 1
DT[, names(DT) %in% unique(names(DT))[2], with = FALSE]
# B B
# 1: 2 5
# 2: 1 1

Resources