I need to recode a data set of test responses for use in another application (a program called BLIMP that imputes missing values). Specifically, I need to represent the test items and subscale assignments with dummy codes.
Here I create a data frame that holds the responses to a 10-item test for two persons in a nested format. These data are a simplified version of the actual input table.
library(tidyverse)
df <- tibble(
person = rep(101:102, each = 10),
item = as.factor(rep(1:10, 2)),
response = sample(1:4, 20, replace = T),
scale = as.factor(rep(rep(1:2, each = 5), 2))
) %>% mutate(
scale_last = case_when(
as.integer(scale) != lead(as.integer(scale)) | is.na(lead(as.integer(scale))) ~ 1,
TRUE ~ NA_real_
)
)
The columns of df contain:
person: ID numbers for the persons (10 rows for each person)
item: test items 1-10 for each person. Note how the items are nested within each person.
response: score for each item
scale: the test has two subscales. Items 1-5 are assigned to subscale 1, and items 6-10 are assigned to subscale 2.
scale_last: a code of 1 in this column indicates that the item is the last item in its assigned sub scale. This characteristic becomes important below.
I then create dummy codes for the items using the recipes package.
library(recipes)
dum <- df %>%
recipe(~ .) %>%
step_dummy(item, one_hot = T) %>%
prep(training = df) %>%
bake(new_data = df)
print(dum, width = Inf)
# person response scale scale_last item_X1 item_X2 item_X3 item_X4 item_X5 item_X6 item_X7
# <int> <int> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 101 2 1 NA 1 0 0 0 0 0 0
# 2 101 3 1 NA 0 1 0 0 0 0 0
# 3 101 3 1 NA 0 0 1 0 0 0 0
# 4 101 1 1 NA 0 0 0 1 0 0 0
# 5 101 1 1 1 0 0 0 0 1 0 0
# 6 101 1 2 NA 0 0 0 0 0 1 0
# 7 101 3 2 NA 0 0 0 0 0 0 1
# 8 101 4 2 NA 0 0 0 0 0 0 0
# 9 101 2 2 NA 0 0 0 0 0 0 0
#10 101 4 2 1 0 0 0 0 0 0 0
#11 102 2 1 NA 1 0 0 0 0 0 0
#12 102 1 1 NA 0 1 0 0 0 0 0
#13 102 2 1 NA 0 0 1 0 0 0 0
#14 102 3 1 NA 0 0 0 1 0 0 0
#15 102 2 1 1 0 0 0 0 1 0 0
#16 102 1 2 NA 0 0 0 0 0 1 0
#17 102 4 2 NA 0 0 0 0 0 0 1
#18 102 2 2 NA 0 0 0 0 0 0 0
#19 102 4 2 NA 0 0 0 0 0 0 0
#20 102 3 2 1 0 0 0 0 0 0 0
# item_X8 item_X9 item_X10
# <dbl> <dbl> <dbl>
# 1 0 0 0
# 2 0 0 0
# 3 0 0 0
# 4 0 0 0
# 5 0 0 0
# 6 0 0 0
# 7 0 0 0
# 8 1 0 0
# 9 0 1 0
#10 0 0 1
#11 0 0 0
#12 0 0 0
#13 0 0 0
#14 0 0 0
#15 0 0 0
#16 0 0 0
#17 0 0 0
#18 1 0 0
#19 0 1 0
#20 0 0 1
The output shows the item dummy codes represented in the columns with the item_ prefix. For downstream processing, I need a further level of recoding. Within each subscale, the items must be dummy-coded relative to the last item of the subscale. Here’s where the scale_last variable comes into play; this variable identifies the rows in the output that need to be recoded.
For example, the first of these rows is row 5, the row for the last item (item 5) in subscale 1 for person 101. In this row the value of column item_X5 needs to be recoded from 1 to 0. In the next row to be recoded (row 10), it is the value of item_X10 that needs to be recoded from 1 to 0. And so on.
I’m struggling for the right combination of dplyr verbs to accomplish this. What’s tripping me up is the need to isolate specific cells within specific rows to be recoded.
Thanks in advance for any help!
We can use mutate_at and replace values from "item" columns to 0 where scale_last == 1
library(dplyr)
dum %>% mutate_at(vars(starts_with("item")), ~replace(., scale_last == 1, 0))
# A tibble: 20 x 14
# person response scale scale_last item_X1 item_X2 item_X3 item_X4 item_X5
# <int> <int> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 101 2 1 NA 1 0 0 0 0
# 2 101 3 1 NA 0 1 0 0 0
# 3 101 1 1 NA 0 0 1 0 0
# 4 101 1 1 NA 0 0 0 1 0
# 5 101 3 1 1 0 0 0 0 0
# 6 101 4 2 NA 0 0 0 0 0
# 7 101 4 2 NA 0 0 0 0 0
# 8 101 3 2 NA 0 0 0 0 0
# 9 101 2 2 NA 0 0 0 0 0
#10 101 4 2 1 0 0 0 0 0
#11 102 2 1 NA 1 0 0 0 0
#12 102 1 1 NA 0 1 0 0 0
#13 102 4 1 NA 0 0 1 0 0
#14 102 4 1 NA 0 0 0 1 0
#15 102 4 1 1 0 0 0 0 0
#16 102 3 2 NA 0 0 0 0 0
#17 102 4 2 NA 0 0 0 0 0
#18 102 1 2 NA 0 0 0 0 0
#19 102 4 2 NA 0 0 0 0 0
#20 102 4 2 1 0 0 0 0 0
# … with 5 more variables: item_X6 <dbl>, item_X7 <dbl>, item_X8 <dbl>,
# item_X9 <dbl>, item_X10 <dbl>
In base R, we can use lapply
cols <- grep("^item", names(dum))
dum[cols] <- lapply(dum[cols], function(x) replace(x, dum$scale_last == 1, 0))
Related
I have a column that contains binary values indicating the presence (1) or absence (0) of an event. Based on this column I want to create a new column containing a continuous count that assigns a single count to groups of adjacent events.
event <- c(0,0,0,1,0,0,0,1,1,1,1,1,0,0,0,0,0,0,1,1,0,0)
count<- c(0,0,0,1,0,0,0,2,2,2,2,2,0,0,0,0,0,0,3,3,0,0)
df <- data.frame(event, count)
The desired count should look like this:
event count
0 0
0 0
0 0
1 1
0 0
0 0
0 0
1 2
1 2
1 2
1 2
1 2
0 0
0 0
0 0
0 0
0 0
0 0
1 3
1 3
0 0
0 0
Any suggestions how to get there are much appreciated. Thank you!
With dplyr, the following checks whether there is a 1 following a 0 and takes the cumulative sum of that. Then, the result is multiplied by event such that the zeros are maintained.
library(dplyr)
df %>%
mutate(count_2 = event * cumsum(event == 1 & lag(event, default = 0) == 0))
gives
event count count_2
1 0 0 0
2 0 0 0
3 0 0 0
4 1 1 1
5 0 0 0
6 0 0 0
7 0 0 0
8 1 2 2
9 1 2 2
10 1 2 2
11 1 2 2
12 1 2 2
13 0 0 0
14 0 0 0
15 0 0 0
16 0 0 0
17 0 0 0
18 0 0 0
19 1 3 3
20 1 3 3
21 0 0 0
22 0 0 0
A base-R variant:
df$count_2 <- df$event * cumsum(c(0, diff(df$event)==1))
Using rle in base R :
df$count1 <- with(df, event * with(rle(event == 1),rep(cumsum(values), lengths)))
df
# event count count1
#1 0 0 0
#2 0 0 0
#3 0 0 0
#4 1 1 1
#5 0 0 0
#6 0 0 0
#7 0 0 0
#8 1 2 2
#9 1 2 2
#10 1 2 2
#11 1 2 2
#12 1 2 2
#13 0 0 0
#14 0 0 0
#15 0 0 0
#16 0 0 0
#17 0 0 0
#18 0 0 0
#19 1 3 3
#20 1 3 3
#21 0 0 0
#22 0 0 0
I have a dataframe where data are grouped by ID. I need to know how many cells are the 10% of each group in order to select this number in a sample, but this sample should select the cells which EP is 1.
I've tried to do a nested For loop: one For to know the quantity of cells which are the 10% for each group and the bigger one to sample this number meeting the condition EP==1
x <- data.frame("ID"=rep(1:2, each=10),"EP" = rep(0:1, times=10))
x
ID EP
1 1 0
2 1 1
3 1 0
4 1 1
5 1 0
6 1 1
7 1 0
8 1 1
9 1 0
10 1 1
11 2 0
12 2 1
13 2 0
14 2 1
15 2 0
16 2 1
17 2 0
18 2 1
19 2 0
20 2 1
for(j in 1:1000){
for (i in 1:nrow(x)){
d <- x[x$ID==i,]
npix <- 10*nrow(d)/100
}
r <- sample(d[d$EP==1,],npix)
print(r)
}
data frame with 0 columns and 0 rows
data frame with 0 columns and 0 rows
data frame with 0 columns and 0 rows
.
.
.
until 1000
I would want to get this dataframe, where each sample is in a new column in x, and the cell sampled has "1":
ID EP s1 s2....s1000
1 1 0 0 0 ....
2 1 1 0 1
3 1 0 0 0
4 1 1 0 0
5 1 0 0 0
6 1 1 0 0
7 1 0 0 0
8 1 1 0 0
9 1 0 0 0
10 1 1 1 0
11 2 0 0 0
12 2 1 0 0
13 2 0 0 0
14 2 1 0 1
15 2 0 0 0
16 2 1 0 0
17 2 0 0 0
18 2 1 1 0
19 2 0 0 0
20 2 1 0 0
see that each 1 in S1 and s2 are the sampled cells and correspond to 10% of cells in each group (1, 2) which meet the condition EP==1
you can try
set.seed(1231)
x <- data.frame("ID"=rep(1:2, each=10),"EP" = rep(0:1, times=10))
library(tidyverse)
x %>%
group_by(ID) %>%
mutate(index= ifelse(EP==1, 1:n(),0)) %>%
mutate(s1 = ifelse(index %in% sample(index[index!=0], n()*0.1), 1, 0)) %>%
mutate(s2 = ifelse(index %in% sample(index[index!=0], n()*0.1), 1, 0))
# A tibble: 20 x 5
# Groups: ID [2]
ID EP index s1 s2
<int> <int> <dbl> <dbl> <dbl>
1 1 0 0 0 0
2 1 1 2 0 0
3 1 0 0 0 0
4 1 1 4 0 0
5 1 0 0 0 0
6 1 1 6 1 1
7 1 0 0 0 0
8 1 1 8 0 0
9 1 0 0 0 0
10 1 1 10 0 0
11 2 0 0 0 0
12 2 1 2 0 0
13 2 0 0 0 0
14 2 1 4 0 1
15 2 0 0 0 0
16 2 1 6 0 0
17 2 0 0 0 0
18 2 1 8 0 0
19 2 0 0 0 0
20 2 1 10 1 0
We can write a function which gives us 1's which are 10% for each ID and place it where EP = 1.
library(dplyr)
rep_func <- function() {
x %>%
group_by(ID) %>%
mutate(s1 = 0,
s1 = replace(s1, sample(which(EP == 1), floor(0.1 * n())), 1)) %>%
pull(s1)
}
then use replicate to repeat it for n times
n <- 5
x[paste0("s", seq_len(n))] <- replicate(n, rep_func())
x
# ID EP s1 s2 s3 s4 s5
#1 1 0 0 0 0 0 0
#2 1 1 0 0 0 0 0
#3 1 0 0 0 0 0 0
#4 1 1 0 0 0 0 0
#5 1 0 0 0 0 0 0
#6 1 1 1 0 0 1 0
#7 1 0 0 0 0 0 0
#8 1 1 0 1 0 0 0
#9 1 0 0 0 0 0 0
#10 1 1 0 0 1 0 1
#11 2 0 0 0 0 0 0
#12 2 1 0 0 1 0 0
#13 2 0 0 0 0 0 0
#14 2 1 1 1 0 0 0
#15 2 0 0 0 0 0 0
#16 2 1 0 0 0 0 1
#17 2 0 0 0 0 0 0
#18 2 1 0 0 0 1 0
#19 2 0 0 0 0 0 0
#20 2 1 0 0 0 0 0
I have the following (simulated) dataset
m=500
n=8
df<-data.frame(matrix(sample(0:1,m*n, replace=TRUE),m,n))
df$ID<-c(1:20)
attach(df)
df<-df[order(ID),]
df$round<-c(1:25)
df$payoff<-runif(n=500, min=1e-12, max=.9999999999)
First, I want a for loop that allows me to compare each row with the one before, so that the output takes value 1 if the payoff of the row is greater than the payoff of the row before. Then, I want the row with the highest payoff that was found so far to function as a reference for the next rows, so that the output takes now value 1 if the payoff of the next row is grater than the payoff of the row with the highest payoff that was found so far. The reference needs to be progressively updated as soon as a new highest value is found.
I managed to build a loop for the first step
df_split <- split(df, df$ID)
y<-data.frame("ID"=NULL, "round"=NULL, "feedback"=NULL)
for (i in 1:length(df_split)) {
myvector<-as.matrix(df_split[[i]][-1:-10])
for (j in 2:nrow(myvector)){
feedb<-ifelse(myvector[j,] > myvector[j-1,], 1, 0)
df2<-data.frame("ID"=i, "round"=j, "feedback"=feedb)
y<-rbind(y,df2)
}
}
Now I want to add to the loop the second step, that is indicating the row with the highest payoff that was found so far as reference, and compare the next row with such row. As already mentioned above, such a reference needs to be updated as a new highest value is found.
Does anybody have a solution?
Thank you for all your help!
EDIT:
Thank you both #r2evans and #Jon_Spring for you suggestions!
The reason why I am using a loop is that I need to calculate the output for each ID independently (sorry, I forgot to mention).
This is also why I am splitting the original dataframe into 20 dataframes (one per ID).
If I understand correctly your solutions, the codes are such that, when it comes for instance to ID = 2, the last payoff with the highest value is associated with ID = 1. The same happens when it comes to ID = 3, ID = 4, and so forth. Then, the resulting output is not correct, because the calculation should restart for each ID.
I didn't know the function cummax, thank you again! I'll try to integrate it into the logic of my loop, which also gives an output column as I need it.
I don't think you need any loops.
Up front, for reproducibility, I set my random seed with set.seed(1) before generating the frame above. This allows you to see the "exact same" frame as I'm creating below.
head(within(df, {
isbetter <- c(TRUE, diff(payoff) > 0)
maxsofar <- cummax(df$payoff)
maxsofar <- c(0, maxsofar[-length(maxsofar)])
isbestsofar <- as.integer(payoff > maxsofar)
}), n=20)
# X1 X2 X3 X4 X5 X6 X7 X8 ID round payoff isbestsofar maxsofar isbetter
# 1 0 1 1 0 1 1 1 1 1 1 0.18776846 1 0.0000000 TRUE
# 21 1 1 0 0 0 0 0 1 1 2 0.50475902 1 0.1877685 TRUE
# 41 1 0 0 0 0 1 0 0 1 3 0.02728685 0 0.5047590 FALSE
# 61 1 1 0 0 0 0 1 0 1 4 0.49629785 0 0.5047590 TRUE
# 81 0 0 0 0 1 1 1 0 1 5 0.94735171 1 0.5047590 TRUE
# 101 1 1 1 0 1 1 0 1 1 6 0.38118213 0 0.9473517 FALSE
# 121 1 1 0 1 0 0 1 0 1 7 0.69821373 0 0.9473517 TRUE
# 141 1 1 0 0 1 0 1 1 1 8 0.68876581 0 0.9473517 FALSE
# 161 0 0 0 0 1 0 0 0 1 9 0.47773068 0 0.9473517 FALSE
# 181 0 1 0 1 1 0 0 1 1 10 0.27334761 0 0.9473517 FALSE
# 201 0 1 0 1 1 0 1 0 1 11 0.75691633 0 0.9473517 TRUE
# 221 0 0 1 1 1 0 1 0 1 12 0.24753206 0 0.9473517 FALSE
# 241 0 0 0 1 0 1 1 0 1 13 0.52133948 0 0.9473517 TRUE
# 261 1 1 0 0 1 0 0 0 1 14 0.61284324 0 0.9473517 TRUE
# 281 0 1 0 1 1 0 1 0 1 15 0.09504998 0 0.9473517 FALSE
# 301 1 1 1 0 0 1 0 0 1 16 0.56575876 0 0.9473517 TRUE
# 321 1 0 1 1 0 1 1 1 1 17 0.01687416 0 0.9473517 FALSE
# 341 1 1 0 1 0 1 0 1 1 18 0.19987888 0 0.9473517 TRUE
# 361 0 0 1 1 1 0 0 1 1 19 0.41758380 0 0.9473517 TRUE
# 381 0 0 1 0 1 1 0 0 1 20 0.20550609 0 0.9473517 FALSE
I use within for simple creation/processing of columns within the data.frame; this could easily be done verbatim df$isbetter <- c(TRUE, diff(df$payoff) > 0), with dplyr, with data.table, or likely in other ways too. Take your pick, the logic and outcome should be effectively the same (other than column order, perhaps).
df$cummax = cummax(df$payoff)
df$new_max = df$payoff==df$cummax
Edit: added group_by, dplyr pipe
library(dplyr)
df2 <- df %>%
group_by(ID) %>%
mutate(cummax = cummax(payoff),
new_max = payoff==cummax) %>%
ungroup()
Output, showing what happens when we get to new ID:
> df2[20:30,]
# A tibble: 11 x 13
X1 X2 X3 X4 X5 X6 X7 X8 ID round payoff cummax new_max
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <lgl>
1 0 0 1 0 1 1 0 0 1 20 0.206 0.947 FALSE
2 1 1 0 1 0 0 1 0 1 21 0.377 0.947 FALSE
3 0 0 1 0 0 0 1 0 1 22 0.0765 0.947 FALSE
4 0 0 1 1 0 0 0 0 1 23 0.145 0.947 FALSE
5 0 0 0 1 0 0 1 0 1 24 0.554 0.947 FALSE
6 1 0 0 0 1 1 1 1 1 25 0.662 0.947 FALSE
7 0 1 1 1 1 0 0 1 2 1 0.736 0.736 TRUE
8 0 1 1 1 1 0 0 0 2 2 0.376 0.736 FALSE
9 1 1 0 0 0 0 0 0 2 3 0.869 0.869 TRUE
10 0 0 1 1 1 0 1 1 2 4 0.795 0.869 FALSE
11 1 1 0 1 1 1 0 1 2 5 0.822 0.869 FALSE
Appreciate your help. Need to split a column filled with delimited values into columns named after its delimited values and each of these new columns are to be filled with either 1 or 0 where values are found or not.
state <-
c('ACT',
'ACT|NSW|NT|QLD|SA|VIC',
'ACT|NSW|NT|QLD|TAS|VIC|WA',
'ACT|NSW|NT|SA|TAS|VIC',
'ACT|NSW|QLD|VIC',
'ACT|NSW|SA',
'ACT|NSW|NT|QLD|TAS|VIC|WA|SA',
'NSW',
'NT',
'NT|SA',
'QLD',
'SA',
'TAS',
'VIC',
'WA')
df <- data.frame(id = 1:length(state),state)
id state
1 1 ACT
2 2 ACT|NSW|NT|QLD|SA|VIC
3 3 ACT|NSW|NT|QLD|TAS|VIC|WA
4 4 ACT|NSW|NT|SA|TAS|VIC
...
Desired state is a dataframe with the same dimensions plus the additional columns based on state populated with a 1 or 0 depending on the rows.
tq,
James
You can do something like this:
library(tidyr)
library(dplyr)
df %>%
separate_rows(state) %>%
unique() %>% # in case you have duplicated states for a single id
mutate(exist = 1) %>%
spread(state, exist, fill=0)
# id ACT NSW NT QLD SA TAS VIC WA
#1 1 1 0 0 0 0 0 0 0
#2 2 1 1 1 1 1 0 1 0
#3 3 1 1 1 1 0 1 1 1
#4 4 1 1 1 0 1 1 1 0
#5 5 1 1 0 1 0 0 1 0
#6 6 1 1 0 0 1 0 0 0
#7 7 1 1 1 1 1 1 1 1
#8 8 0 1 0 0 0 0 0 0
#9 9 0 0 1 0 0 0 0 0
#10 10 0 0 1 0 1 0 0 0
#11 11 0 0 0 1 0 0 0 0
#12 12 0 0 0 0 1 0 0 0
#13 13 0 0 0 0 0 1 0 0
#14 14 0 0 0 0 0 0 1 0
#15 15 0 0 0 0 0 0 0 1
separate_rows split state and convert the data frame to long format;
add a constant value column for reshaping purpose;
use spread to transform the result to wide format;
Here is a base R option to split the 'state' column by |, convert the list of vectors into a two column data.frame (stack), get the frequency with table and cbind with the first column of 'df'
cbind(df[1], as.data.frame.matrix(table(stack(setNames(strsplit(as.character(df$state),
"[|]"), df$id))[2:1])))
# id ACT NSW NT QLD SA TAS VIC WA
#1 1 1 0 0 0 0 0 0 0
#2 2 1 1 1 1 1 0 1 0
#3 3 1 1 1 1 0 1 1 1
#4 4 1 1 1 0 1 1 1 0
#5 5 1 1 0 1 0 0 1 0
#6 6 1 1 0 0 1 0 0 0
#7 7 1 1 1 1 1 1 1 1
#8 8 0 1 0 0 0 0 0 0
#9 9 0 0 1 0 0 0 0 0
#10 10 0 0 1 0 1 0 0 0
#11 11 0 0 0 1 0 0 0 0
#12 12 0 0 0 0 1 0 0 0
#13 13 0 0 0 0 0 1 0 0
#14 14 0 0 0 0 0 0 1 0
#15 15 0 0 0 0 0 0 0 1
I have data.frames of counts such as:
a <- data.frame(id=1:10,
"1"=c(rep(1,3),rep(0,7)),
"3"=c(rep(0,4),rep(1,6)))
names(a)[2:3] <- c("1","3")
a
> a
id 1 3
1 1 1 0
2 2 1 0
3 3 1 0
4 4 0 0
5 5 0 1
6 6 0 1
7 7 0 1
8 8 0 1
9 9 0 1
10 10 0 1
and a template data.frame such as
m <- data.frame(id=1:10,
"1"= rep(0,10),
"2"= rep(0,10),
"3"= rep(0,10),
"4"= rep(0,10))
names(m)[-1] <- 1:4
m
> m
id 1 2 3 4
1 1 0 0 0 0
2 2 0 0 0 0
3 3 0 0 0 0
4 4 0 0 0 0
5 5 0 0 0 0
6 6 0 0 0 0
7 7 0 0 0 0
8 8 0 0 0 0
9 9 0 0 0 0
10 10 0 0 0 0
and I want to add the values of a into the template m
in the appropraite columns, leaving the rest as 0.
This is working but I would like to know
if there is a more elegant way, perhaps using plyr or data.table:
provi <- rbind.fill(a,m)
provi[is.na(provi)] <- 0
mnew <- aggregate(provi[,-1],by=list(provi$id),FUN=sum)
names(mnew)[1] <- "id"
mnew <- mnew[c(1,order(names(mnew)[-1])+1)]
mnew
> mnew
id 1 2 3 4
1 1 1 0 0 0
2 2 1 0 0 0
3 3 1 0 0 0
4 4 0 0 0 0
5 5 0 0 1 0
6 6 0 0 1 0
7 7 0 0 1 0
8 8 0 0 1 0
9 9 0 0 1 0
10 10 0 0 1 0
I guess the concise option would be:
m[names(a)] <- a
Or we match the column names ('i1'), use that to create the column index with max.col, cbind with the row index ('i2'), and a similar step can be done to create 'i3'. We change the values in 'm' corresponding to 'i2' with the 'a' values based on 'i3'.
i1 <- match(names(a)[-1], names(m)[-1])
i2 <- cbind(m$id, i1[max.col(a[-1], 'first')]+1L)
i3 <- cbind(a$id, max.col(a[-1], 'first')+1L)
m[i2] <- a[i3]
m
# id 1 2 3 4
#1 1 1 0 0 0
#2 2 1 0 0 0
#3 3 1 0 0 0
#4 4 0 0 0 0
#5 5 0 0 1 0
#6 6 0 0 1 0
#7 7 0 0 1 0
#8 8 0 0 1 0
#9 9 0 0 1 0
#10 10 0 0 1 0
A data.table option would be melt/dcast
library(data.table)
dcast(melt(setDT(a), id.var='id')[,
variable:= factor(variable, levels=1:4)],
id~variable, value.var='value', drop=FALSE, fill=0)
# id 1 2 3 4
# 1: 1 1 0 0 0
# 2: 2 1 0 0 0
# 3: 3 1 0 0 0
# 4: 4 0 0 0 0
# 5: 5 0 0 1 0
# 6: 6 0 0 1 0
# 7: 7 0 0 1 0
# 8: 8 0 0 1 0
# 9: 9 0 0 1 0
#10: 10 0 0 1 0
A similar dplyr/tidyr option would be
library(dplyr)
library(tidyr)
gather(a, Var, Val, -id) %>%
mutate(Var=factor(Var, levels=1:4)) %>%
spread(Var, Val, drop=FALSE, fill=0)
You could use merge, too:
res <- suppressWarnings(merge(a, m, by="id", suffixes = c("", "")))
(res[, which(!duplicated(names(res)))][, names(m)])
# id 1 2 3 4
# 1 1 1 0 0 0
# 2 2 1 0 0 0
# 3 3 1 0 0 0
# 4 4 0 0 0 0
# 5 5 0 0 1 0
# 6 6 0 0 1 0
# 7 7 0 0 1 0
# 8 8 0 0 1 0
# 9 9 0 0 1 0
# 10 10 0 0 1 0