Cross tabulate multiple response questions - r

I need to cross tabulate multiple responses (stored as a set of variables) by a grouping variable. My survey question is: "Which of the following fruits have you had?" The respondent from either geographical Area 1 or Area 2 is then given a list with "1. Orange, 2. Mango, ..." and the resulting data from the yes (1) or no (0) questions is:
set.seed(1)
df <- data.frame(area=rep(c('Area 1','Area 2'), each=6),
var_orange=sample(0:1, 12, T),
var_banana=sample(0:1, 12, T),
var_melon=sample(0:1, 12, T),
var_mango=sample(0:1, 12, T))
area var_orange var_banana var_melon var_mango
1 Area 1 0 1 0 1
2 Area 1 0 0 0 0
3 Area 1 1 1 0 1
4 Area 1 1 0 0 0
5 Area 1 0 1 1 1
6 Area 1 1 1 0 1
7 Area 2 1 0 0 1
8 Area 2 1 1 1 1
9 Area 2 1 1 0 1
10 Area 2 0 0 0 1
11 Area 2 0 1 1 0
12 Area 2 0 0 1 0
I would like to get an summary output like this generated in Stata:
| area
| Area 1 Area 2 | Total
------------+------------------------+-----------
var_orange | 50.00 50.00 | 50.00
var_banana | 66.67 50.00 | 58.33
var_melon | 16.67 50.00 | 33.33
var_mango | 66.67 66.67 | 66.67
------------+------------------------+-----------
Total | 200.00 216.67 | 208.33
I found a related post with a multfreqtable function which gives a one-way summary for my data:
multfreqtable = function(data, question.prefix) {
z = length(question.prefix)
temp = vector("list", z)
for (i in 1:z) {
a = grep(question.prefix[i], names(data))
b = sum(data[, a] != 0)
d = colSums(data[, a] )
e = sum(rowSums(data[,a]) !=0)
f = as.numeric(c(d, b))
temp[[i]] = data.frame(question = c(sub(question.prefix[i],
"", names(d)), "Total"),
freq = f,
percent_response = (f/b)*100,
percent_cases = round((f/e)*100, 2))
names(temp)[i] = question.prefix[i]
}
temp
}
multfreqtable(df, "var_")
$var_
question freq percent_response percent_cases
1 orange 6 24 54.55
2 banana 7 28 63.64
3 melon 4 16 36.36
4 mango 8 32 72.73
5 Total 25 100 227.27
But I am interested in a two-way summary.
I could use dplyr as suggested in a post and get:
df %>%
summarise(orange_pct=round(sum(var_orange,na.rm=TRUE)*100/n(),2),
banana_pct=round(sum(var_banana,na.rm=TRUE)*100/n(),2),
melon_pct=round(sum(var_melon,na.rm=TRUE)*100/n(),2),
mango_pct=round(sum(var_mango,na.rm=TRUE)*100/n(),2))
orange_pct banana_pct melon_pct mango_pct
1 50 58.33 33.33 66.67
But I need a neater table output with marginal column frequencies.

You could first calculate the values using dplyr, then put them in a table using e.g. knitr::kable.
library(dplyr)
library(knitr)
set.seed(1)
df <- data.frame(area = rep(c('Area 1','Area 2'), each = 6),
var_orange = sample(0:1, 12, T),
var_banana = sample(0:1, 12, T),
var_melon = sample(0:1, 12, T),
var_mango = sample(0:1, 12, T))
t1 <- df %>% group_by(area) %>% summarise_each(funs(mean))
t2 <- df %>% summarise_each(funs(mean))
kable(rbind(t1, t2))
And you would get:
|area | var_orange| var_banana| var_melon| var_mango|
|:------|----------:|----------:|---------:|---------:|
|Area 1 | 0.5| 0.6666667| 0.1666667| 0.6666667|
|Area 2 | 0.5| 0.5000000| 0.5000000| 0.6666667|
|NA | 0.5| 0.5833333| 0.3333333| 0.6666667|
To further polish the output to mimick that from Stata:
polished <- 100 * rbind(t1, t2) %>% # Use percentages
select(-area) %>% # Drop "area"
mutate(Total = rowSums(.[])) %>% # Add Total
as.matrix %>% t
kable(polished, digits = 2, col.names = c("Area 1", "Area 2", "Total"))
The end result would be:
| | Area 1| Area 2| Total|
|:----------|------:|------:|------:|
|var_orange | 50.00| 50.00| 50.00|
|var_banana | 66.67| 50.00| 58.33|
|var_melon | 16.67| 50.00| 33.33|
|var_mango | 66.67| 66.67| 66.67|
|Total | 200.00| 216.67| 208.33|

A different solution using aggregate is
T1 = aggregate(df[,2:5], list(df$area), sum)
rownames(T1) = T1[,1]
T1 = t(T1[,-1])
T1 = addmargins(T1, 1:2, FUN = c(Total = sum), quiet=TRUE)
T1
Area 1 Area 2 Total
var_orange 3 3 6
var_banana 4 3 7
var_melon 1 3 4
var_mango 4 4 8
Total 12 13 25
Thanks to #rawr for suggesting the simplification of using addmargins.
If you want the table expressed as percentages instead of counts, simply divide by the total count to get the fraction and then change to a percentage.
T1 = aggregate(df[,2:5], list(df$area), sum)
rownames(T1) = T1[,1]
T1 = t(T1[,-1])
T1 = T1 * 100 / sum(T1)
T1 = addmargins(T1, FUN = c(Total = sum), quiet=TRUE)
T1
Area 1 Area 2 Total
var_orange 12 12 24
var_banana 16 12 28
var_melon 4 12 16
var_mango 16 16 32
Total 48 52 100

Related

calculate frequency of unique values per group in R

How can I count the number of unique values such that I go from:
organisation <- c("A","A","A","A","B","B","B","B","C","C","C","C","D","D","D","D")
variable <- c("0","0","1","2","0","0","1","1","0","0","1","1","0","0","2","2")
df <- data.frame(organisation,variable)
organisation | variable
A | 0
A | 1
A | 2
A | 2
B | 0
B | 0
B | 1
B | 1
C | 0
C | 0
C | 1
C | 1
D | 0
D | 2
D | 2
D | 2
To:
unique_values | frequency
0,1,2 | 1
0,1 | 2
0,2 | 1
There are only 3 possible sequences:
0,1,2
0,1
0,2
Try this
s <- aggregate(. ~ organisation , data = df , \(x) names(table(x)))
s$variable <- sapply(s$variable , \(x) paste0(x , collapse = ","))
setNames(aggregate(. ~ variable , data = s , length) , c("unique_values" , "frequency"))
output
unique_values frequency
1 0,1 2
2 0,1,2 1
3 0,2 1
You can do something simple like this:
library(dplyr)
library(stringr)
distinct(df) %>%
arrange(variable) %>%
group_by(organisation) %>%
summarize(unique_values = str_c(variable,collapse = ",")) %>%
count(unique_values)
Output:
unique_values n
<chr> <int>
1 0,1 2
2 0,1,2 1
3 0,2 1

How can I summarize statistics in a loop in R

I have a dataset containing about 60 variables (A, B, C, D, ...), each with 3 corresponding information columns (A, Group_A and WOE_A) as in the list below:
ID A Group_A WOE_A B Group_B WOE_B C Group_C WOE_C D Group_D WOE_D Status
213 0 1 0.87 0 1 0.65 0 1 0.80 915.7 4 -0.30 1
321 12 5 0.08 4 4 -0.43 6 5 -0.20 85.3 2 0.26 0
32 0 1 0.87 0 1 0.65 0 1 0.80 28.6 2 0.26 1
13 7 4 -0.69 2 3 -0.82 4 4 -0.80 31.8 2 0.26 0
43 1 2 -0.04 1 2 -0.49 1 2 -0.22 51.7 2 0.26 0
656 2 3 -0.28 2 3 -0.82 2 3 -0.65 8.5 1 1.14 0
435 2 3 -0.28 0 1 0.65 0 1 0.80 39.8 2 0.26 0
65 8 4 -0.69 3 4 -0.43 5 4 -0.80 243.0 3 0.00 0
565 0 1 0.87 0 1 0.65 0 1 0.80 4.0 1 1.14 0
432 0 1 0.87 0 1 0.65 0 1 0.80 81.6 2 0.26 0
I want to print a table in R with some statistics (Min(A), Max(A), WOE_A, Count(Group_A), Count(Group_A, where Status=1), Count(Group_A, where Status=0)), all grouped by Group for each of the 60 variables and I think I need to perform it in a loop.
I tried the "dplyr" package, but I don't know how to refer to all the three columns (A, Group_A and WOE_A) that relate to a variable (A) and also how to summarize the information for all the desired statistics.
The code I began with is:
df <- data
List <- list(df)
for (colname in colnames(df)) {
List[[colname]]<- df %>%
group_by(df[,colname]) %>%
count()
}
List
This is how I want to print results:
**Var A
Group Min(A) Max(A) WOE_A Count(Group_A) Count_1(Group_A, where Status=1) Count_0(Group_A, where Status=0)**
1
2
3
4
5
Thank you very much!
Laura
Laura, as mentioned by the others, working with "long" data frames is better than with wide data frames.
Your initial idea using dplyr and group_by() got you almost there.
Note: this is also a way to break down your data and then combine it with generic columns, if the wide-long is pushing the limits.
Let's start with this:
library(dplyr)
#---------- extract all "A" measurements
df %>%
select(A, Group_A, WOE_A, Status) %>%
#---------- grouped summary of multiple stats
group_by(A) %>%
summarise(
Min = min(A)
, Max = max(A)
, WOE_A = unique(WOE_A)
, Count = n() # n() is a helper function of dplyr
, CountStatus1 = sum(Status == 1) # use sum() to count logical conditions
, CountStatus0 = sum(Status == 0)
)
This yields:
# A tibble: 6 x 7
A Min Max WOE_A Count CountStatus1 CountStatus0
<dbl> <dbl> <dbl> <dbl> <int> <int> <int>
1 0 0 0 0.87 4 2 2
2 1 1 1 -0.04 1 0 1
3 2 2 2 -0.28 2 0 2
4 7 7 7 -0.69 1 0 1
5 8 8 8 -0.69 1 0 1
6 12 12 12 0.08 1 0 1
OK. Turning your wide dataframe into a long one is not a trivial go as you nest measurements and variable names. On top, ID and Status are ids/key variables for each row.
The standard tool to convert wide to long is tidyr's pivot_longer(). Read up on this.
In your particular case we want to push multiple columns into multiple targets. For this you need to get a feel for the .value sentinel. The pivot_longer() help pages might be useful for studying this case.
To ease the pain of constructing a complex regex expression to decode the variable names, I rename your group-id-label, e.g. A, B, to X_A, X_B. This ensures that all column-names are built in the form of what_letter`!
library(tidyr)
df %>%
# ----------- prepare variable names to be well-formed, you may do this upstream
rename(X_A = A, X_B = B, X_C = C, X_D = D) %>%
# ----------- call pivot longer with .value sentinel and names_pattern
# ----------- that is an advanced use of the capabilities
pivot_longer(
cols = -c("ID","Status") # apply to all cols besides ID and Status
, names_to = c(".value", "label") # target column names are based on origin names
# and an individual label (think id, name as u like)
, names_pattern = "(.*)(.*_[A-D]{1})$") # regex for the origin column patterns
# pattern is built of 2 parts (...)(...)
# (.*) no or any symbol possibly multiple times
# (.*_[A-D]{1}) as above, but ending with underscore and 1 letter
This gives you
# A tibble: 40 x 6
ID Status label X Group WOE
<dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 213 1 _A 0 1 0.87
2 213 1 _B 0 1 0.65
3 213 1 _C 0 1 0.8
4 213 1 _D 916. 4 -0.3
5 321 0 _A 12 5 0.08
6 321 0 _B 4 4 -0.43
7 321 0 _C 6 5 -0.2
8 321 0 _D 85.3 2 0.26
9 32 1 _A 0 1 0.87
10 32 1 _B 0 1 0.65
Putting all together
df %>%
# ------------ prepare and make long
rename(X_A = A, X_B = B, X_C = C, X_D = D) %>%
pivot_longer(cols = -c("ID","Status")
, names_to = c(".value", "label")
, names_pattern = "(.*)(.*_[A-D]{1})$") %>%
# ------------- calculate stats on groups
group_by(label, X) %>%
summarise(Min = min(X), Max = max(X), WOE = unique(WOE)
,Count = n(), CountStatus1 = sum(Status == 1)
, CountStatus0 = sum(Status == 0)
)
Voila:
# A tibble: 27 x 8
# Groups: label [4]
label X Min Max WOE Count CountStatus1 CountStatus0
<chr> <dbl> <dbl> <dbl> <dbl> <int> <int> <int>
1 _A 0 0 0 0.87 4 2 2
2 _A 1 1 1 -0.04 1 0 1
3 _A 2 2 2 -0.28 2 0 2
4 _A 7 7 7 -0.69 1 0 1
5 _A 8 8 8 -0.69 1 0 1
6 _A 12 12 12 0.08 1 0 1
7 _B 0 0 0 0.65 5 2 3
8 _B 1 1 1 -0.49 1 0 1
9 _B 2 2 2 -0.82 2 0 2
10 _B 3 3 3 -0.43 1 0 1
# ... with 17 more rows
The loop that I managed to do is available below.
Apart from the tables that I wanted to list, I also needed to make a chart which would show some of the information from each listed table, and then print a PDF with each variable and corresponding table and chart on a different page.
data <- as.data.frame(data)
# 5 is the column where my first information related to a variable is, so for each variable I am building the data with its' related columns
i <- 5
#each variable has 3 columns (Value, Group, WOE)
for (i in seq(5, 223, 3)){
ID <- data[,1]
A <- data[,i]
Group <- data[,i+1]
WOE <- data[,i+2]
Status <- data[,224]
df <- cbind(ID, A, Group, WOE, Status)
df <- data.frame(df)
# Perform table T with its' corresponding statistics
T <- df %>%
select(A, Group, WOE, Status) %>%
group_by(Group) %>%
summarise(
Min = min(A, na.rm=TRUE), Max = max(A, na.rm=TRUE), WOE = unique(WOE),
Count = n(),
CountStatus1 = sum(Status == 1),
CountStatus0 = sum(Status == 0),
BadRate = round((CountStatus1/Count)*100,1))
print(colnames(data)[i])
print(T)
# Then I plot some information from Table T
p <- ggplot(T) + geom_col(aes(x=Group, y=CountStatus1), size = 1, color = "darkgreen", fill = "darkgreen")
p <- p + geom_line(aes(x=Group, y=WOE*1000), col="firebrick", size=0.9) +
geom_point(aes(x=Group, y=WOE*1000), col="gray", size=3) +
ggtitle(label = paste("WOE and Event Count by Group", " - " , colnames(data)[i])) +
labs(x = "Group", y = "Event Count", size=7) +
theme(plot.title = element_text(size=8, face="bold", margin = margin(10, 0, 10, 0)),
axis.text.x = element_text(angle=0, hjust = 1)) +
scale_y_continuous(sec.axis = sec_axis(trans = ~ . /1000, name="WOE", breaks = seq(-3, 5, 0.5)))
print(p)
}
The information is printed for all the variables that I need as in the pictures below:
Table for one of the variables
Chart for the same variable
However, now I encounter some problems with exporting results in a pdf. I do not know how I could print the results of each table and chart on a distinct page in a PDF.

How to assign a number between 1 and n in R to rows?

I would like to assign individual in my data randomly to a group numbered 1 though 3, how would I do this? ( a DPLYR Solution is preferred), individuals (rows with the same id# must be in the same group)
_______________________
id # | group_id |
454452 | 1 |
5450441 | 2 |
5444531 | 3 |
5444531 | 3 |
5404501 | 1 |
5404041 | 2 |
5404041 | 2 |
254252 | 3 |
541254 | 2 |
_______________________
A simple solution might be:
df <- df %>% group_by(id) %>% mutate(group_id = sample(1:3,1))
which (using set.seed(12345)) resulted in:
id group_id
1 454452 3
2 5450441 1
3 5444531 2
4 5444531 2
5 5404501 2
6 5404041 3
7 5404041 3
8 254252 2
9 541254 2
Here's one option:
library(dplyr)
df <-
tibble(ids = c(100, 200, 200, 300, 300, 400))
distinct_ids <-
df %>%
select(ids) %>%
distinct() %>%
mutate(group_num = sample.int(3, size = nrow(.), replace = TRUE))
df %>%
left_join(distinct_ids, by = "ids")
# A tibble: 6 x 2
ids group_num
<dbl> <int>
1 100 3
2 200 1
3 200 1
4 300 3
5 300 3
6 400 2
In base R we could sample the factorized "id" and display them as.numeric.
set.seed(42) # for sake of reproducibility
dat <- transform(dat, group_id=as.numeric(factor(id, levels=sample(unique(dat$id)))))
dat
# id X1 X2 X3 group_id
# 1 454452 -1.1045994 0.0356312 1.93557177 1
# 2 5450441 0.5390238 1.3149588 1.72323080 5
# 3 5444531 0.5802063 0.9781675 0.35840206 6
# 4 5444531 -0.6575028 0.8817912 0.30243092 6
# 5 5404501 1.5548955 0.4822047 -0.39411451 7
# 6 5404041 -1.1876414 0.9657529 0.78814062 2
# 7 5404041 0.1518129 -0.8145709 0.67070383 2
# 8 254252 -1.0861326 0.2839578 -0.94918081 4
# 9 541254 1.6133728 -0.1616986 0.03613574 3
Data
dat <- structure(list(id = c(454452L, 5450441L, 5444531L, 5444531L,
5404501L, 5404041L, 5404041L, 254252L, 541254L), X1 = c(-1.10459944068306,
0.539023801893912, 0.580206320853481, -0.657502835154674, 1.55489554810057,
-1.18764140164182, 0.151812914504533, -1.08613257605253, 1.61337280035418
), X2 = c(0.0356311982051355, 1.31495884897891, 0.978167526364279,
0.881791226863203, 0.482204688262918, 0.965752878105794, -0.814570938270238,
0.283957806364306, -0.161698647607024), X3 = c(1.93557176599585,
1.72323079854894, 0.358402056802064, 0.3024309248682, -0.394114506412192,
0.788140622823556, 0.67070382675052, -0.949180809687611, 0.0361357384849679
)), class = "data.frame", row.names = c(NA, -9L))

Random sampling only a subset of data in R

I have a dataset (N of 2794) of which I want to extract a subset, randomly reallocate the class and put it back into the dataframe.
Example
| Index | B | C | Class|
| 1 | 3 | 4 | Dog |
| 2 | 1 | 9 | Cat |
| 3 | 9 | 1 | Dog |
| 4 | 1 | 1 | Cat |
From the above example, I want to random take N number of observations from column 'Class' and mix them up so you get something like this..
| Index | B | C | Class|
| 1 | 3 | 4 | Cat | Re-sampled
| 2 | 1 | 9 | Dog | Re-sampled
| 3 | 9 | 1 | Dog |
| 4 | 1 | 1 | Dog | Re-sampled
This code randomly extracts rows and re samples them, but I don't want to extract the rows. I want to keep them in the dataframe.
sample(Class[sample(nrow(Class),N),])
Suppose df is your data frame:
df <- data.frame(index=1:4, B=c(3,1,9,1), C=c(4,9,1,1), Class=c("Dog", "Cat", "Dog", "Cat"))
Would this do what you want?
dfSamp <- sample(1:nrow(df), N)
df$Class[dfSamp] <- sample(df$Class[dfSamp])
I simulated the data frame and did an example:
df <- data.frame(
ID=1:4,
Class=c('Dog', 'Cat', 'Dog', 'Cat')
)
N <- 2
sample_ids <- sample(nrow(df), N)
df$Class[sample_ids] <- sample(df$Class, length(sample_ids))
Assuming Class is how you named your datafame, you could do this:
library(dplyr)
bind_rows(
Class %>%
mutate(origin = 'not_sampled'),
Class %>%
sample(100, replace = TRUE) %>%
mutate(origin = 'sampled'))
Sample 100 observations of the original dataframe and stack them to the bottom of it. I am also adding a column so that you know if the observation was sampled or present in the dataframe from the beginning.
What you're wanting to do is replace in-line some classes, but not others.
So, if we start with a data frame, df
set.seed(100)
df = data.frame(index = 1:100,
B = sample(1:10,100,replace = T),
C = sample(1:10,100,replace = T),
Class = sample(c('Cat','Dog','Bunny'),100,replace = T))
And you want to update 5 random rows, then we need to pick which rows to update and what new classes to put in those rows. By referencing unique(df$class) you don't weight the classes by their current occurrence. You could adjust this with the weight argument or remove unique to use occurrence as weight.
n_rows = 5
rows_to_update = sample(1:100,n_rows,replace = F)
new_classes = sample(unique(df$Class),n_rows,replace = T)
rows_to_update
#> [1] 85 65 94 60 48
new_classes
#> [1] "Bunny" "Dog" "Dog" "Dog" "Bunny"
We can inspect what the original data looked like
df[rows_to_update,]
#> index B C Class
#> 85 85 1 2 Dog
#> 65 65 5 1 Bunny
#> 94 94 5 10 Dog
#> 60 60 3 7 Bunny
#> 48 48 9 1 Cat
We can update this in place with a reference to the column and the rows to update.
df$Class[rows_to_update] = new_classes
df[rows_to_update,]
#> index B C Class
#> 85 85 1 2 Bunny
#> 65 65 5 1 Dog
#> 94 94 5 10 Dog
#> 60 60 3 7 Dog
#> 48 48 9 1 Bunny

Generate Data Frame from Count Data

I am trying to create an unsummarized data frame from a data frame of count data.
I have had some experience creating sample datasets but I am having some trouble trying to get a specific number of rows and proportion for each state/person without coding each of them separately and then combining them. I was able to do it using the following code but I feel like there is a better way.
set.seed(2312)
dragon <- sample(c(1),3,replace=TRUE)
Maine <- sample(c("Maine"),3,replace=TRUE)
Maine1 <- data.frame(dragon, Maine)
dragon <- sample(c(0),20,replace=TRUE)
Maine <- sample(c("Maine"),20,replace=TRUE)
Maine2 <- data.frame(dragon, Maine)
Maine2
library(dplyr)
maine3 <- bind_rows(Maine1, Maine2)
Is there a better way to generate this dataset then the code above?
I am trying to create a data frame from the following count data:
+-------------+--------------+--------------+
| | # of dragons | # no dragons |
+-------------+--------------+--------------+
| Maine | 3 | 20|
| California | 1 | 10|
| Jocko | 28 | 110515 |
| Jessica Day | 17 | 26122 |
| | 14 | 19655 |
+-------------+--------------+--------------+
And I would like it to look like this:
+-----------------------+---------------+
| | Dragons (1/0) |
+-----------------------+---------------+
| Maine | 1 |
| Maine | 1 |
| Maine | 1 |
| Maine | 0 |
| Maine….(2:20) | 0…. |
| California | 1 |
| California….(2:10) | 0… |
| Ect.. | |
+-----------------------+---------------+
I do not want the code written for me but would love with ideas on function or examples that you think might be helpful.
I am not completely sure what does sampling have to do with this problem?
It looks to me like you are looking for untable.
Here is an example
data:
set.seed(1)
no_drag = sample(1:5, 5)
drag = sample(15:25, 5)
df <- data.frame(names = LETTERS[1:5],
drag,
no_drag)
names drag no_drag
1 A 24 2
2 B 25 5
3 C 20 4
4 D 23 3
5 E 15 1
library(reshape)
library(tidyverse)
df %>%
gather(key, value, 2:3) %>% #convert to long format
{untable(.,num = .$value)} %>% #untable by value column
mutate(value = ifelse(key == "drag", 0, 1)) %>% #convert values to 0/1
select(-key) %>% #remove unwanted column
arrange(names) #optional
#part of output
names value
1 A 0
2 A 0
3 A 0
4 A 0
5 A 0
6 A 0
7 A 0
8 A 0
9 A 0
10 A 0
11 A 0
12 A 0
13 A 0
14 A 0
15 A 0
16 A 0
17 A 0
18 A 0
19 A 0
20 A 0
21 A 0
22 A 0
23 A 0
24 A 0
25 A 1
26 A 1
27 B 0
28 B 0
29 B 0
30 B 0
there are other ways to tackle the problem here is one:
One is like #Frank mentioned in the comment:
df %>%
gather(key, val, 2:3) %>%
mutate(v = Map(rep, key == "drag", val)) %>%
unnest %>%
select(-key, -val)
Another:
df <- gather(df, key, value, 2:3)
df <- df[rep(seq_len(nrow(df)), df$value), 1:2]
df$key[df$key == "drag"] <- FALSE
df$key[df$key != "drag"] <- TRUE
One can use tidyr::expand to expand rows in desired format.
The solution using df used by #missuse can be shown as:
library(tidyverse)
df %>% gather(key,value,-names) %>%
mutate(key = ifelse(key=="drag", 1, 0)) %>%
group_by(names,key) %>%
expand(value = 1:value) %>%
select(names, value = key) %>%
as.data.frame()
# names value
# 1 A 0
# 2 A 0
# 3 A 1
# 4 A 1
# 5 A 1
# 6 A 1
# 7 A 1
# 8 A 1
# 9 A 1
# 10 A 1
# ...so on
# 117 E 1
# 118 E 1
# 119 E 1
# 120 E 1
# 121 E 1
# 122 E 1

Resources