Join two dataframe - r

I have to collect values from one dataframe and place in another. I have tried to use merge function but that mess up order in second dataframe.
This is how my data looks like.
> df<-as.data.frame(cbind(letters[1:4],1:4))
> df
V1 V2
1 a 1
2 b 2
3 c 3
4 d 4
> dflist <- data.frame("home"= sample(df[,1],15, replace = TRUE))
>
> dflist$away <-sample(df[,1],15, replace = TRUE)
> dflist
home away
1 a b
2 a a
3 d c
4 d a
5 c c
6 a c
7 b d
8 b b
9 a b
10 b d
11 b a
12 a a
13 a c
14 c b
15 d a
Desired result should look like this.
home away value1 value2
1 a b 1 2
2 a a 1 1
3 d c 4 3
4 d a 4 1
5 c c 3 3
.
Outcome table will be lose its order if I use merge here.

You could try this:
dflist[c("value1", "value2")] <- t(apply(dflist, 1, function(x)
c(df[match(x[1], df$V1),2], df[match(x[2], df$V1),2])))
dflist
home away value1 value2
1 a b 1 2
2 a a 1 1
3 d c 4 3
4 d a 4 1
5 c c 3 3
6 a c 1 3
7 b d 2 4
8 b b 2 2
9 a b 1 2
10 b d 2 4
11 b a 2 1
12 a a 1 1
13 a c 1 3
14 c b 3 2
15 d a 4 1

Related

Is there an R function to merge two data frames based on two columns separately matching to the same column?

I would like to two populate values ("VAL") based on one of two columns separately("VALA","VALB").
# Data
DF1 <- data.frame("colA" = rep(c("A","B"), 6),
"colB" = rep(c("C","D","E"), 4))
DF2 <- data.frame("colC" = c("A","B","C","D","E"),
"VAL" = 1:5)
# three join calls
tmp1 <- left_join(DF1, DF2, by=c("colA"="colC"))
names(tmp1)[3] <- "VALA"
tmp2 <- left_join(DF1, DF2, by=c("colB"="colC"))
names(tmp2)[3] <- "VALB"
left_join(tmp1, tmp2, by=c("colA", "colB"))
# colA colB VALA VALB
# 1 A C 1 3
# 2 A C 1 3
# 3 B D 2 4
# 4 B D 2 4
# 5 A E 1 5
# 6 A E 1 5
# 7 B C 2 3
# 8 B C 2 3
# 9 A D 1 4
# 10 A D 1 4
# 11 B E 2 5
# 12 B E 2 5
# 13 A C 1 3
# 14 A C 1 3
# 15 B D 2 4
# 16 B D 2 4
# 17 A E 1 5
# 18 A E 1 5
# 19 B C 2 3
# 20 B C 2 3
# 21 A D 1 4
# 22 A D 1 4
# 23 B E 2 5
# 24 B E 2 5
Why does the last operation give 24 rows as output instead of expected 12?
Is there any possibility to achieve the same expected out in the most elegant way(instead of 3 join operations)?
You can use match to find the corresponding value and cbind the resluting columns.
cbind(DF1, VALA=DF2$VAL[match(DF1$colA, DF2$colC)],
VALB=DF2$VAL[match(DF1$colB, DF2$colC)])
colA colB VALA VALB
#1 A C 1 3
#2 B D 2 4
#3 A E 1 5
#4 B C 2 3
#5 A D 1 4
#6 B E 2 5
#7 A C 1 3
#8 B D 2 4
#9 A E 1 5
#10 B C 2 3
#11 A D 1 4
#12 B E 2 5
or use names:
x <- setNames(DF2$VAL, DF2$colC)
cbind(DF1, VALA=x[DF1$colA], VALB=x[DF1$colB])
and in case for many columns using match inside lapply
cbind(DF1, setNames(lapply(DF1, function(x) DF2$VAL[match(x, DF2$colC)]),
sub("col", "VAL", names(DF1))))
# colA colB VALA VALB
#1 A C 1 3
#2 B D 2 4
#3 A E 1 5
#4 B C 2 3
#5 A D 1 4
#6 B E 2 5
#7 A C 1 3
#8 B D 2 4
#9 A E 1 5
#10 B C 2 3
#11 A D 1 4
#12 B E 2 5
Try to combine left_join after one another using %>% and define its suffixes.
DF1 <- DF1 %>%
left_join(DF2, c("colA" = "colC")) %>%
left_join(DF2, c("colB" = "colC"),
suffix = c ("A", "B"))
> DF1
colA colB VALA VALB
1 A C 1 3
2 B D 2 4
3 A E 1 5
4 B C 2 3
5 A D 1 4
6 B E 2 5
7 A C 1 3
8 B D 2 4
9 A E 1 5
10 B C 2 3
11 A D 1 4
12 B E 2 5

merge by id and column name in R

I am trying to merge two data set into one using id and column name as indices.
I have the following data
df <-
a b c d e f g id
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4
panel_empty <-
id df_id df_data df1_data df2_data df3_data
1 a
1 b
1 c
1 d
1 e
1 f
1 g
2 a
2 b
2 c
2 d
2 e
2 f
2 g
3 a
3 b
3 c
3 d
3 e
3 f
3 g
4 a
4 b
4 c
4 d
4 e
4 f
4 g
I would like to merge these somehow to look like this
panel_full <-
id df_id df_data df2_data df3_data
1 a 1
1 b 1
1 c 1
1 d 1
1 e 1
1 f 1
1 g 1
2 a 2
2 b 2
2 c 2
2 d 2
2 e 2
2 f 2
2 g 2
3 a 3
3 b 3
3 c 3
3 d 3
3 e 3
3 f 3
3 g 3
4 a 4
4 b 4
4 c 4
4 d 4
4 e 4
4 f 4
4 g 4
I only know how to merge by id but have no idea how to merge by id and column name. For panel data data this is quite important to do and I was surprised not find any similar problem on this site.
EDIT:
So far, I was able to convert from wide to long
long <- melt(df, id.vars = c("id"))
However, I do not know to move on.
I tried
m1 <- merge(panel_emtpy, long, by.x = "id", by.y = "df_id")
Here's a way with dplyr and tidyr::gather() -
panel_full %>%
left_join(gather(df, df_id, df_data, -id), by = c("id", "df_id"))

How do I transform this matrix format to a matrix for repeated measures analysis in R? [duplicate]

This question already has answers here:
Reshaping data.frame from wide to long format
(8 answers)
Closed 6 years ago.
Sorry "this matrix format" is very vague in my question (suggestions to improve my question?). I have a matrix that's like this
x <- data.frame(ID = c('A','B','C','D'), SCORE_YR1 = c(2,2,1,0),
SCORE_YR2 = c(2,3,3,1), SCORE_YR3 = c(0,2,2,5))
x
ID SCORE_YR1 SCORE_YR2 SCORE_YR3
1 A 2 2 0
2 B 2 3 2
3 C 1 3 2
4 D 0 1 5
I would like to transform the matrix format to look like this
y <- data.frame(ID = rep(c('A','B','C','D'),3), YEAR = rep(1:3,each=4),
SCORE = c(x$SCORE_YR1,x$SCORE_YR2,x$SCORE_YR3))
y
ID YEAR SCORE
1 A 1 2
2 B 1 2
3 C 1 1
4 D 1 0
5 A 2 2
6 B 2 3
7 C 2 3
8 D 2 1
9 A 3 0
10 B 3 2
11 C 3 2
12 D 3 5
Is there a function that can easily transform the dataframe like this?
Thanks
You can use melt from the reshape2 package:
library(reshape2)
x <- melt(x, id.vars = "ID")
Change column names to what you have above:
names(x)[2:3] <- c("YEAR","SCORE")
At this point the data frame it looks like this:
> x
ID YEAR SCORE
1 A SCORE_YR1 2
2 B SCORE_YR1 2
3 C SCORE_YR1 1
4 D SCORE_YR1 0
5 A SCORE_YR2 2
6 B SCORE_YR2 3
7 C SCORE_YR2 3
8 D SCORE_YR2 1
9 A SCORE_YR3 0
10 B SCORE_YR3 2
11 C SCORE_YR3 2
12 D SCORE_YR3 5
Doing as.numeric on your YEAR column converts it to a number:
x$YEAR <- as.numeric(x$YEAR)
> x
ID YEAR SCORE
1 A 1 2
2 B 1 2
3 C 1 1
4 D 1 0
5 A 2 2
6 B 2 3
7 C 2 3
8 D 2 1
9 A 3 0
10 B 3 2
11 C 3 2
12 D 3 5
The problem is that you have data in a "wide" format and you want to convert it to "long". melt is usually great for these situations.
With dplyr and tidyr, you can do:
library(dplyr); library(tidyr)
x %>%
gather(YEAR, SCORE, -ID) %>%
mutate(YEAR = extract_numeric(YEAR))
# ID YEAR SCORE
#1 A 1 2
#2 B 1 2
#3 C 1 1
#4 D 1 0
#5 A 2 2
#6 B 2 3
#7 C 2 3
#8 D 2 1
#9 A 3 0
#10 B 3 2
#11 C 3 2
#12 D 3 5
Or use reshape function from base R:
reshape(x, varying = 2:4, sep = "_YR", dir = "long", timevar = "YEAR")[1:3]
# ID YEAR SCORE
#1.1 A 1 2
#2.1 B 1 2
#3.1 C 1 1
#4.1 D 1 0
#1.2 A 2 2
#2.2 B 2 3
#3.2 C 2 3
#4.2 D 2 1
#1.3 A 3 0
#2.3 B 3 2
#3.3 C 3 2
#4.3 D 3 5
A base solution that would give you something that could easily be reworked to what you need would involve using stack. The data.frame function will do the "rep()-ing for you via R's recyclng rules:
y <- data.frame(x$ID, stack(x[-1]))
y
#-------------
x.ID values ind
1 A 2 SCORE_YR1
2 B 2 SCORE_YR1
3 C 1 SCORE_YR1
4 D 0 SCORE_YR1
5 A 2 SCORE_YR2
6 B 3 SCORE_YR2
7 C 3 SCORE_YR2
8 D 1 SCORE_YR2
9 A 0 SCORE_YR3
10 B 2 SCORE_YR3
11 C 2 SCORE_YR3
12 D 5 SCORE_YR3
This would convert the factor ind column to a numeric vector:
> y$ind <- seq_along(unique(y$ind))[y$ind]
> y
x.ID values ind
1 A 2 1
2 B 2 1
3 C 1 1
4 D 0 1
5 A 2 2
6 B 3 2
7 C 3 2
8 D 1 2
9 A 0 3
10 B 2 3
11 C 2 3
12 D 5 3

Counting how many times an element occurs in the column of a data.frame

Let's say I have a data.frame with a factor.
d = data.frame(f = c("a","a","a","b","b","b","b","d","d"))
f
1 a
2 a
3 a
4 b
5 b
6 b
7 b
8 d
9 d
And I want to add a column telling me how many times an element occurs.
Like this
f n
1 a 3
2 a 3
3 a 3
4 b 4
5 b 4
6 b 4
7 b 4
8 d 2
9 d 2
How would I do this?
Can also use some plyr functions - join & ddply
d <- data.frame(f = c("a","a","a","b","b","b","b","d","d"))
d2 <- join(d, ddply(d, .(f), 'nrow'))
d2
f nrow
1 a 3
2 a 3
3 a 3
4 b 4
5 b 4
6 b 4
7 b 4
8 d 2
9 d 2
You can use table like this:
d$n <- table(d$f)[d$f]
# f n
#1 a 3
#2 a 3
#3 a 3
#4 b 4
#5 b 4
#6 b 4
#7 b 4
#8 d 2
#9 d 2
You can use ave and length:
> d$n <- as.numeric(ave(as.character(d$f), d$f, FUN = length))
> d
f n
1 a 3
2 a 3
3 a 3
4 b 4
5 b 4
6 b 4
7 b 4
8 d 2
9 d 2
With the "data.table" package, you might do something like:
library(data.table)
D <- data.table(d)
D[, n := as.numeric(.N), by = f]

Reproduce character pattern as numeric pattern

I would like to expand the following data frame
d <- data.frame(a = c(rep("A",5),rep("B",5),rep("C",3),rep("D",2)))
> d
a
1 A
2 A
3 A
4 A
5 A
6 B
7 B
8 B
9 B
10 B
11 C
12 C
13 C
14 D
15 D
so that there is a column b looking like:
> d
a b
1 A 1
2 A 1
3 A 1
4 A 1
5 A 1
6 B 2
7 B 2
8 B 2
9 B 2
10 B 2
11 C 3
12 C 3
13 C 3
14 D 4
15 D 4
Not really sure how to realise that.
Use match:
match(d$a, unique(d$a))
d$b <- as.integer(factor(d$a, levels=unique(d$a)))

Resources