Match values based on multiple conditions from dataframes of different sizes in R

Match values based on multiple conditions from dataframes of different sizes in R - r

I have two dataframes of different sizes. Example:
t1 <- data.frame("id"=c(1,1,1,2,2,2,4,5,5,5,6,7,8),"condition"=c(3,3,1,5,5,5,10,10,5,5,2,3,1) )
t2 <- data.frame("ind"=c(1,2,4,5,6,7,8),"test_c"=c(3,5,10,10,2,3,1), "time"=c(32,55,21,34,55,22,19))
I would like to match the cases based on two criteria:
t1$id==t2$ind and t1$condition==t2$test_c and create an additional column in t1 based on the outcome of the variable t2$time under these two conditions.
Expected outcome:
t3 <- data.frame("id"=c(1,1,1,2,2,2,4,5,5,5,6,7,8),"condition"=c(3,3,1,5,5,5,10,10,5,5,2,3,1) , "time"=c (32,32,NA,55,55,55,21,34,NA,NA,55,22,19))
I suspect I should use merge or match functions but I am not sure which would be the right approach.

Base R
> out <- merge(t1, t2, by.x=c("id","condition"), by.y=c("ind","test_c"), all.x=TRUE)
> out
id condition time
1 1 1 NA
2 1 3 32
3 1 3 32
4 2 5 55
5 2 5 55
6 2 5 55
7 4 10 21
8 5 5 NA
9 5 5 NA
10 5 10 34
11 6 2 55
12 7 3 22
13 8 1 19
dplyr
library(dplyr)
left_join(t1, t2, by = c("id" = "ind", "condition" = "test_c"))
Differences with your t3
There are some differences between them. For the sake of display, I'll show them side-by-side, arranged so that we have an easier comparison.
cbind(out[with(out,order(id,condition)),], t3[with(t3,order(id,condition)),])
# id condition time id condition time
# 1 1 1 NA 1 1 NA
# 2 1 3 32 1 3 32
# 3 1 3 32 1 3 32
# 4 2 5 55 2 5 55
# 5 2 5 55 2 5 NA
# 6 2 5 55 2 5 NA
# 7 4 10 21 4 10 21
# 8 5 5 NA 5 5 NA
# 9 5 5 NA 5 5 NA
# 10 5 10 34 5 10 34
# 11 6 2 55 6 2 55
# 12 7 3 22 7 3 22
# 13 8 1 19 8 1 19
The only differences are with id=2,condition=5, where all of them in the merge are assigned the same time=55, and your t3 fills only the first of them. I don't think this is a "first only" logic, as there are other repeat id,condition that do not elicit the same response. I suspect this is just a mistake with the sample data, or perhaps there is post-merge processing you haven't told us yet :-)

In case you want to use match you can use in addition interaction (or paste) to use multiple columns.
t1$time <- t2[match(interaction(t1), interaction(t2[-3])), 3]
t1
# id condition time
#1 1 3 32
#2 1 3 32
#3 1 1 NA
#4 2 5 55
#5 2 5 55
#6 2 5 55
#7 4 10 21
#8 5 10 34
#9 5 5 NA
#10 5 5 NA
#11 6 2 55
#12 7 3 22
#13 8 1 19

Related

Assign ID based on a sequence of consecutive days in R

I have a dataset with repeated measures which I want to use to assign IDs. The repeated measures are from a sequence of consecutive days. However, the sequence itself may be unbalanced (e.g., some have more days while others have less, some start with day 1 but a few others may start with 2 or 3). My question is how to create and assign the same ID withinid the same block of sequence. Here is a toy dataset:
days <- data.frame(
day = c(1L,2L,3L,4L,5L,6L,8L,9L,10L,
2L,3L,4L,5L,6L,7L,9L,10L,
1L,2L,4L,5L,6L,8L,9L,10L,
1L,2L,3L,4L,5L,6L,7L,8L,9L,10L)
)
Here is the end result I expect:
id day
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 1 6
7 1 8
8 1 9
9 1 10
10 2 2
11 2 3
12 2 4
13 2 5
14 2 6
15 2 7
16 2 9
17 2 10
18 3 1
19 3 2
20 3 4
21 3 5
22 3 6
23 3 8
24 3 9
25 3 10
26 4 1
27 4 2
28 4 3
29 4 4
30 4 5
31 4 6
32 4 7
33 4 8
34 4 9
35 4 10

Get the difference between adjacent elements and check if it is less than 0, take the cumulative sum
days$id <- cumsum(c(TRUE, diff(days$day) < 0))

Assign value to a column of data frame from a vector in R

I have a df like this:
df.temp1 = data.frame("A"=c(2,3,4,6,8,2,5,7))
> df.temp1
A
1 2
2 3
3 4
4 6
5 8
6 2
7 5
8 7
And a vector like this:
vec_list = c(10,20,90,40,60,70,80,100)
I need a value of list assigned to a new column based index of vec_list. The desired output is:
A B
1 2 20
2 3 90
3 4 40
4 6 70
5 8 100
6 2 20
7 5 60
8 7 80
How to do it? I tried melt, but got errors.

We can make use of the 'A' column as index for subsetting the 'vec_list' and assign it to 'B'
df.temp1$B <- vec_list[df.temp1$A]
df.temp1
# A B
#1 2 20
#2 3 90
#3 4 40
#4 6 70
#5 8 100
#6 2 20
#7 5 60
#8 7 80

selecting common columns from different elements of a list

I have a data set in list format. The list is further divide into 20 elements. Each element contains 12 rows and some columns. Now I want to extract common columns from each element of the list and make a new data set. I try to make a reproducible example. Please see code
a<-data.frame(x=(1:10),y=(1:10),z=(1:10))
b<-data.frame(x=(1:10),y=(1:10),n=(1:10))
c<-data.frame(x=(1:10),y=(1:10),q=(1:10))
data<-list(a,b,c)
data1<-ldply(data)
required_data<-data1[,-3:-5]

Find the common columns using Reduce, subset them from list and bind them together
cols <- Reduce(intersect, lapply(data, colnames))
do.call(rbind, lapply(data, `[`, cols))
# x y
#1 1 1
#2 2 2
#3 3 3
#4 4 4
#5 5 5
#6 6 6
#7 7 7
#8 8 8
#9 9 9
#10 10 10
#11 1 1
#...
The last step can also be performed using
purrr::map_df(data, `[`, cols)

with base R, you can fist find the names in common
commonName <- names((r<-table(unlist(Map(names,data))))[r>1])
then retrieve the columns from list and integrate (similar to the second step in the solution by #Ronak Shah)
res <- Reduce(rbind,lapply(data, '[',commonName))
which gives:
> res
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
11 1 1
12 2 2
13 3 3
14 4 4
15 5 5
16 6 6
17 7 7
18 8 8
19 9 9
20 10 10
21 1 1
22 2 2
23 3 3
24 4 4
25 5 5
26 6 6
27 7 7
28 8 8
29 9 9
30 10 10

How to generate new variables based on the name of the variables in the data frame

For example, I have a toy dataset as the one I created below,
a1<-1:10
a2<-11:20
v<-c(1,2,1,NA,2,1,2,1,2,1)
data<-data.frame(a1,a2,v,stringsAsFactors = F)
Then I want to create a new variable y which will be assigned the value a1 or a2 or NA based on the value of variable v. Therefore, the 'y'
should equals to 1 12 3 NA 15 6 17 8 19 10.
I want to generate it with the command similar to the ones I list below, It doesn't work, I guess it's because of the vectorization issue, then how can I fix it?
In reality, I have several as, say 10 and the actual values are characters instead of numeric ones.
data$y[!is.na(data$v)]<-data[,paste0('a',data$v)]
or
data%>%
mutate(y=ifelse(!is.na(v),get(paste0('a',v)),NA))

You could use standard indexing with cbind for that:
dat$y <- dat[cbind(1:nrow(dat), dat$v)]
The result:
> dat
a1 a2 v y
1 1 11 1 1
2 2 12 2 12
3 3 13 1 3
4 4 14 NA NA
5 5 15 2 15
6 6 16 1 6
7 7 17 2 17
8 8 18 1 8
9 9 19 2 19
10 10 20 1 10
(I used dat instead of data, because it is not wise to call a dataframe the same as a function; see ?data)

Only idea that comes to my mind:
data%>%
mutate(y=ifelse(!is.na(v),paste0('a',v),NA)) %>%
mutate(z=ifelse(!is.na(y),(ifelse(y=="a1",get("a1"),get("a2"))),NA))
a1 a2 v y z
1 1 11 1 a1 1
2 2 12 2 a2 12
3 3 13 1 a1 3
4 4 14 NA <NA> NA
5 5 15 2 a2 15
6 6 16 1 a1 6
7 7 17 2 a2 17
8 8 18 1 a1 8
9 9 19 2 a2 19
10 10 20 1 a1 10
or more directly:
data%>%
mutate(y=ifelse(!is.na(v),(ifelse(v==1, get("a1"),get("a2"))),NA))
a1 a2 v y
1 1 11 1 1
2 2 12 2 12
3 3 13 1 3
4 4 14 NA NA
5 5 15 2 15
6 6 16 1 6
7 7 17 2 17
8 8 18 1 8
9 9 19 2 19
10 10 20 1 10
still based on ifelse :(

You need to use a matrix accessor:
# Get the indices of missing values
ind <- which(!is.na(data$v))
# Transform colnames to indices
tab <- structure(match(c("a1", "a2"), names(data)), .Names = c("a1", "a2"))
# Access data with a matrix accessor
data$y[ind] <- data[cbind(ind, tab[paste0('a', data$v[ind])])]

Vectorized Conditional Random Matching

I want to create conditional random pairs without using for-loops so I can use the code with large datasets. At first, I create rows with unique IDs and randomly assign two different "types" to my rows:
df<-data.frame(id=1:10,type=NA,partner=NA)
df[sample(df$id,nrow(df)/2),"type"]<-1 ##random 50% type 1
df[which(is.na(df$type)==TRUE),"type"]<-2 ##other 50% type 2
df
id type partner
1 1 2 NA
2 2 1 NA
3 3 1 NA
4 4 1 NA
5 5 2 NA
6 6 1 NA
7 7 1 NA
8 8 2 NA
9 9 2 NA
10 10 2 NA
Now I want them to receive a random partner of the opposite type. So I randomize my type 1 IDs and match them to some type 2 IDs like so:
df$partner[which(df$type==2)]<-sample(df$id[which(df$type==1)],
nrow(df)/2)
df
id type partner
1 1 2 4
2 2 1 NA
3 3 1 NA
4 4 1 NA
5 5 2 2
6 6 1 NA
7 7 1 NA
8 8 2 6
9 9 2 3
10 10 2 7
And that's where I'm stuck. For some reason I can't think of a vectorized way to tell R "take the IDs of type 1, look where these IDs are in df$partner and return the corresponding row ID as df$partner instead of NA".
One example for a for-loop for conditional random pairing can be found here: click
I'm pretty sure that that's very basic and doable, however, any help appreciated!

Presumably, you want the type 1 and type 2 matched together to have each other's id in their respective partner entries. Fully vectorized solution.
# Define number of ids
n = 100
# Generate startingn data frame
df = data.frame(id = 1:n, type = NA, partner = NA)
# Generate the type column
df$type[(a<-sample(df$id, n/2))] = 1
df$type[(b<-setdiff(1:100, a))] = 2
# Select a random partner id from the other type
df$partner[a] = sample(df$id[b])
# Fill in partner values based on previous line
df$partner[b] = df$id[match(df$id[b], df$partner)]
Output:
id type partner
1 2 11
2 1 13
3 2 19
4 2 10
5 1 17
6 2 28
7 2 27
8 2 21
9 1 22
10 1 4
11 1 1
12 2 20
13 2 2
14 2 25
15 2 24
16 2 30
17 2 5
18 2 29
19 1 3
20 1 12
21 1 8
22 2 9
23 2 26
24 1 15
25 1 14
26 1 23
27 1 7
28 1 6
29 1 18
30 1 16

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Match values based on multiple conditions from dataframes of different sizes in R - r

Related

Assign ID based on a sequence of consecutive days in R

Assign value to a column of data frame from a vector in R

selecting common columns from different elements of a list

How to generate new variables based on the name of the variables in the data frame

Vectorized Conditional Random Matching

Categories

Resources