R: column binding with unequal number of rows - r

I have two data sets. Each of them has variables ID, Block, and RT (reaction time). I want to merge/column bind the two sets so that I have one data set with variables: ID, Block, RT1, RT2. The problem is there is unequal number of rows in the two sets. Also, it is important that the ID and block number match. Missing values should be replaced with NA. So what I have:
head(blok1, 10)
ID Blok RT1
1 1 1 592
2 1 1 468
3 1 1 530
4 1 1 546
5 1 1 452
6 1 1 483
7 1 2 499
8 1 2 452
9 1 2 608
10 1 2 530
head(blok2, 10)
ID Blok RT2
1 1 1 592
2 1 1 920
3 1 1 686
4 1 1 561
5 1 1 561
6 1 2 327
7 1 2 686
8 1 2 670
9 1 2 702
10 1 3 920
What I want to have:
ID Blok RT1 RT2
1 1 1 592 592
2 1 1 468 920
3 1 1 530 686
4 1 1 546 561
5 1 1 452 561
6 1 1 483 NA
7 1 2 499 327
8 1 2 452 686
9 1 2 608 670
10 1 2 530 702
etc.

Here's a solution using dplyr, also utilizing an index or unique ID:
blok1 <- data.frame(ID = c(1, 1, 2), RT1 = c(11, 12, 13))
blok2 <- data.frame(ID = c(1, 2, 2), RT2 = c(21, 22, 23))
library(dplyr)
## if you want NAs for RT2 only
blok1 %>%
mutate(uID = row_number()) %>%
left_join(blok2 %>% mutate(uID = row_number()), by = c("uID", "ID"))
# uID ID RT1 RT2
# 1 1 1 11 21
# 2 2 1 12 NA
# 3 3 2 13 23
## if you want NAs for both RT1 and RT2
blok1 %>%
mutate(uID = row_number()) %>%
outer_join(blok2 %>% mutate(uID = row_number()), by = c("uID", "ID"))
# uID ID RT1 RT2
# 1 1 1 11 21
# 2 2 1 12 NA
# 3 3 2 13 23
# 4 2 2 NA 22

Related

Sum 1:n by group

Have: Dataset I need to sum i:n for each row within each group
demo<-data.frame(th=c(c(0,24,26),(c(0,1,2,4))),hs=c(rep(220,3),c(rep(240,4))),
seq=(c(1:3,1:4)),group=c(rep(1,3),rep(2,4)))
Here's what that looks like:
> demo
th hs seq group
1 0 220 1 1
2 24 220 2 1
3 26 220 3 1
4 0 240 1 2
5 1 240 2 2
6 2 240 3 2
7 4 240 4 2
Need a vector that is a based on the hs, seq, and th columns but that is a summation of the hs column raised to the seq column and times the th columns up to that row within the group.
demo[1,"an"]<- demo[1,"hs"]^demo[1,"seq"] * demo[1,"th"]
demo[2,"an"]<-sum(demo[1,"hs"]^demo[1,"seq"] * demo[1,"th"],
demo[2,"hs"]^demo[2,"seq"] * demo[2,"th"] )
demo[3,"an"]<-sum(demo[1,"hs"]^demo[1,"seq"] * demo[1,"th"],
demo[2,"hs"]^demo[2,"seq"] * demo[2,"th"],
demo[3,"hs"]^demo[3,"seq"] * demo[3,"th"])
demo[6,"an"]<-sum(demo[4,"hs"]^demo[4,"seq"] * demo[4,"th"],
demo[5,"hs"]^demo[5,"seq"] * demo[5,"th"],
demo[6,"hs"]^demo[6,"seq"] * demo[6,"th"])
Here's what that new column (an) should look like
> demo
th hs seq group an
1 0 220 1 1 0
2 24 220 2 1 1161600
3 26 220 3 1 278009600
4 0 240 1 2 NA
5 1 240 2 2 NA
6 2 240 3 2 27705600
7 4 240 4 2 NA
Ignore the NA's in this MRE, those need to be filled in too.
Libraries
library(tidyverse)
Sample data
df <-
read.csv(
text =
"th hs seq group
0 220 1 1
24 220 2 1
26 220 3 1
0 240 1 2
1 240 2 2
2 240 3 2
4 240 4 2",
sep = " ",header = T
)
Code
df %>%
#Grouping by group
group_by(group) %>%
#Applying a cumulative sum of the formula, by group
mutate(an = cumsum(hs^seq*th))
Output
th hs seq group an
<int> <int> <int> <int> <dbl>
1 0 220 1 1 0
2 24 220 2 1 1161600
3 26 220 3 1 278009600
4 0 240 1 2 0
5 1 240 2 2 57600
6 2 240 3 2 27705600
7 4 240 4 2 13298745600
We can use data.table
library(data.table)
setDT(df)[, an := cumsum(hs^seq^th), group]

R Loop To New Data Frame Summary Weighted

I have a tall data frame as such:
data = data.frame("id"=c(1,2,3,4,5,6,7,8,9,10),
"group"=c(1,1,2,1,2,2,2,2,1,2),
"type"=c(1,1,2,3,2,2,3,3,3,1),
"score1"=c(sample(1:4,10,r=T)),
"score2"=c(sample(1:4,10,r=T)),
"score3"=c(sample(1:4,10,r=T)),
"score4"=c(sample(1:4,10,r=T)),
"score5"=c(sample(1:4,10,r=T)),
"weight1"=c(173,109,136,189,186,146,173,102,178,174),
"weight2"=c(147,187,125,126,120,165,142,129,144,197),
"weight3"=c(103,192,102,159,128,179,195,193,135,145),
"weight4"=c(114,182,199,101,111,116,198,123,119,181),
"weight5"=c(159,125,104,171,166,154,197,124,180,154))
library(reshape2)
library(plyr)
data1 <- reshape(data, direction = "long",
varying = list(c(paste0("score",1:5)),c(paste0("weight",1:5))),
v.names = c("score","weight"),
idvar = "id", timevar = "count", times = c(1:5))
data1 <- data1[order(data1$id), ]
And what I want to create is a new data frame like so:
want = data.frame("score"=rep(1:4,6),
"group"=rep(1:2,12),
"type"=rep(1:3,8),
"weightedCOUNT"=NA) # how to calculate this? count(data1, score, wt = weight)
I am just not sure how to calculate weightedCOUNT which should apply the weights to the score variable so then it gives in column 'weightedCOUNT' a weighted count that is aggregated by score and group and type.
An option would be to melt (from data.table - which can take multiple measure patterns, and then grouped by 'group', 'type' get the count
library(data.table)
library(dplyr)
melt(setDT(data), measure = patterns('^score', "^weight"),
value.name = c("score", "weight")) %>%
group_by(group, type) %>%
count(score, wt = weight)
If we need to have a complete set of combinations
library(tidyr)
melt(setDT(data), measure = patterns('^score', "^weight"),
value.name = c("score", "weight")) %>%
group_by(group, type) %>%
ungroup %>%
complete(group, type, score, fill = list(n = 0))
If I understand correctly, weightedCOUNT is the sum of weights grouped by score, group, and type.
For the sake of completeness, I would like to show how the accepted solution would look like when implemented in pure base R and pure data.table syntax, resp.
Base R
The OP was almost there. He has already reshaped data from wide to long format for multiple value variables. Only the final aggregation step was missing:
data1 <- reshape(data, direction = "long",
varying = list(c(paste0("score",1:5)),c(paste0("weight",1:5))),
v.names = c("score","weight"),
idvar = "id", timevar = "count", times = c(1:5))
result <- aggregate(weight ~ score + group + type, data1, FUN = sum)
result
score group type weight
1 1 1 1 479
2 3 1 1 558
3 4 1 1 454
4 1 2 1 378
5 2 2 1 154
6 3 2 1 174
7 4 2 1 145
8 1 2 2 535
9 2 2 2 855
10 3 2 2 248
11 4 2 2 499
12 1 1 3 189
13 2 1 3 351
14 3 1 3 600
15 4 1 3 362
16 1 2 3 596
17 2 2 3 265
18 3 2 3 193
19 4 2 3 522
result can be reordered by
with(result, result[order(score, group, type), ])
score group type weight
1 1 1 1 479
12 1 1 3 189
4 1 2 1 378
8 1 2 2 535
16 1 2 3 596
13 2 1 3 351
5 2 2 1 154
9 2 2 2 855
17 2 2 3 265
2 3 1 1 558
14 3 1 3 600
6 3 2 1 174
10 3 2 2 248
18 3 2 3 193
3 4 1 1 454
15 4 1 3 362
7 4 2 1 145
11 4 2 2 499
19 4 2 3 522
data.table
As shown by akrun, melt() from the data.table package can be combined with dplyr. Alternatively, we can stay with the data.table syntax for aggregation:
library(data.table)
cols <- c("score", "weight") # to save typing
melt(setDT(data), measure = patterns(cols), value.name = cols)[
, .(weightedCOUNT = sum(weight)), keyby = .(score, group, type)]
score group type weightedCOUNT
1: 1 1 1 479
2: 1 1 3 189
3: 1 2 1 378
4: 1 2 2 535
5: 1 2 3 596
6: 2 1 3 351
7: 2 2 1 154
8: 2 2 2 855
9: 2 2 3 265
10: 3 1 1 558
11: 3 1 3 600
12: 3 2 1 174
13: 3 2 2 248
14: 3 2 3 193
15: 4 1 1 454
16: 4 1 3 362
17: 4 2 1 145
18: 4 2 2 499
19: 4 2 3 522
The keyby parameter is used for grouping and ordering the output in one step.
Completion of missing combinations of the grouping variables is also possible in data.table syntax using the cross join function CJ():
melt(setDT(data), measure = patterns(cols), value.name = cols)[
, .(weightedCOUNT = sum(weight)), keyby = .(score, group, type)][
CJ(score, group, type, unique = TRUE), on = .(score, group, type)][
is.na(weightedCOUNT), weightedCOUNT := 0][]
score group type weightedCOUNT
1: 1 1 1 479
2: 1 1 2 0
3: 1 1 3 189
4: 1 2 1 378
5: 1 2 2 535
6: 1 2 3 596
7: 2 1 1 0
8: 2 1 2 0
9: 2 1 3 351
10: 2 2 1 154
11: 2 2 2 855
12: 2 2 3 265
13: 3 1 1 558
14: 3 1 2 0
15: 3 1 3 600
16: 3 2 1 174
17: 3 2 2 248
18: 3 2 3 193
19: 4 1 1 454
20: 4 1 2 0
21: 4 1 3 362
22: 4 2 1 145
23: 4 2 2 499
24: 4 2 3 522
score group type weightedCOUNT

How to rank a column with a condition

I have a data frame :
dt <- read.table(text = "
1 390
1 366
1 276
1 112
2 97
2 198
2 400
2 402
3 110
3 625
4 137
4 49
4 9
4 578 ")
The first colomn is Index and the second is distance.
I want to add a colomn to rank the distance by Index in a descending order (the highest distance will be ranked first)
The result will be :
dt <- read.table(text = "
1 390 1
1 66 4
1 276 2
1 112 3
2 97 4
2 198 3
2 300 2
2 402 1
3 110 2
3 625 1
4 137 2
4 49 3
4 9 4
4 578 1")
Another R base approach
> dt$Rank <- unlist(tapply(-dt$V2, dt$V1, rank))
A tidyverse solution
dt %>%
group_by(V1) %>%
mutate(Rank=rank(-V2))
transform(dt,s = ave(-V2,V1,FUN = rank))
V1 V2 s
1 1 390 1
2 1 66 4
3 1 276 2
4 1 112 3
5 2 97 4
6 2 198 3
7 2 300 2
8 2 402 1
9 3 110 2
10 3 625 1
11 4 137 2
12 4 49 3
13 4 9 4
14 4 578 1
You could group, arrange, and rownumber. The result is a bit easier on the eyes than a simple rank, I think, and so worth an extra step.
dt %>%
group_by(V1) %>%
arrange(V1,desc(V2)) %>%
mutate(rank = row_number())
# A tibble: 14 x 3
# Groups: V1 [4]
V1 V2 rank
<int> <int> <int>
1 1 390 1
2 1 366 2
3 1 276 3
4 1 112 4
5 2 402 1
6 2 400 2
7 2 198 3
8 2 97 4
9 3 625 1
10 3 110 2
11 4 578 1
12 4 137 2
13 4 49 3
14 4 9 4
A scrambled alternative is min_rank
dt %>%
group_by(V1) %>%
mutate(min_rank(desc(V2)) )

Add column with numbers based on a second column

Here my data.frame:
df = read.table(text = 'Day ID Event
100 1 1
100 1 1
99 1 1
97 1 1
87 2 1
86 2 1
85 2 1
965 1 2
964 1 2
960 1 2
959 1 2
709 2 2
708 2 2
12 3 2
9 3 2', header = TRUE)
What I would like to do is to create a new column which, considering the ID and Event ones, assign for each observation a number in decreasing order based on the relative Day ones.
My desired output would be:
Day ID Event Count
100 1 1 4
100 1 1 4
99 1 1 3
97 1 1 1
87 2 1 3
86 2 1 2
85 2 1 1
965 1 2 7
964 1 2 6
960 1 2 2
959 1 2 1
709 2 2 2
708 2 2 1
12 3 2 4
9 3 2 1
E.g. If you look at the first 'block' above: Day 97 = 1, Day 98 = 2, Day 99 = 3 and Day 100 = 4. We are missing Day 98 but we still need to include it in the count.
I tried the following but the output is not the one I need:
df$Count <- ave(df$Day, df$Event, df$ID, FUN = seq_along)
Thanks for your help
We can try
library(dplyr)
df %>%
group_by(ID, Event) %>%
mutate(Count = 1+(Day-Day[n()]))

Create a new column with a sum based on the value of three other columns

I have a data frame and I want to create another column based on the information of three different columns. I am using R.
I want to start counting on 0 and to add 2 in each new cell, based on a column Time and on Item and Participants information. I want to have 0 for the beginning of the Time counting (which is in ms) for each item of each participant.
df <- data.frame(Item=c(1,1,1,1,1,1,2,2,2,2,2,2),
Part=c(1,1,1,2,2,2,1,1,1,2,2,2),
Time=c(1234,1235,1236,345,346,347,1546,1547,1548,234,235,236))
Item Part Time
1 1 1 1234
2 1 1 1235
3 1 1 1236
4 1 2 345
5 1 2 346
6 1 2 347
7 2 1 1546
8 2 1 1547
9 2 1 1548
10 2 2 234
11 2 2 235
12 2 2 236
With the new column the table would be something like:
Item Part Time NewColumn
1 1 1 1234 0
2 1 1 1235 2
3 1 1 1236 4
4 1 2 345 0
5 1 2 346 2
6 1 2 347 4
7 2 1 1546 0
8 2 1 1547 2
9 2 1 1548 4
10 2 2 234 0
11 2 2 235 2
12 2 2 236 4
Many thanks in advance.
In case the structure stays as it is
library(dplyr)
result <- df %>% group_by(Part, Item) %>% mutate(NewColumn = seq (0,4,2))
I group by Item and Part and create a new column that counts 0, 2, 4
Item Part Time NewColumn
1 1 1 1234 0
2 1 1 1235 2
3 1 1 1236 4
4 1 2 345 0
5 1 2 346 2
6 1 2 347 4
7 2 1 1546 0
8 2 1 1547 2
9 2 1 1548 4
10 2 2 234 0
11 2 2 235 2
12 2 2 236 4
In order to be more flexible (if you have more than 3 rows per group), you can use
result <- df %>% group_by(Part, Item) %>% mutate(NewColumn = 2* (row_number()-1))
which will will generate numbers in the sequence 0, 2, 4, 6, 8,...
library(data.table)
df <- data.table(df)
df[, NewCol := seq(0,nrow(df),2), by=list(Item,Part)]
Er... df = cbind(df,NewColumn=c(0,2,4))?
+1 for library(plyr)
library(plyr)
ddply(df, c("Item","Part"), mutate,NewColumn = seq(0,4,2))
Item Part Time NewColumn
1 1 1234 0
1 1 1235 2
1 1 1236 4
1 2 345 0
1 2 346 2
1 2 347 4
2 1 1546 0
2 1 1547 2
2 1 1548 4
2 2 234 0
2 2 235 2
2 2 236 4

Resources