R How to count by group starting when condition is met - r

df <- data.frame (id = c(1,1,1,2,2,2,3,3,3,3,4,4,4,4,4,4),
qresult=c(0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0),
count=c(0,0,0,0,1,2,0,0,1,2,1,2,3,4,5,6))
> df
id qresult count
1 1 0 0
2 1 0 0
3 1 0 0
4 2 0 0
5 2 1 1
6 2 0 2
7 3 0 0
8 3 0 0
9 3 1 1
10 3 0 2
11 4 1 1
12 4 0 2
13 4 0 3
14 4 0 4
15 4 0 5
16 4 0 6
What would be a way to obtain the count column which begins counting when the condition, q_result==1 is met and resets for each new id?

We could wrap with double cumsum on a logical vector after grouping
library(dplyr)
df %>%
group_by(id) %>%
mutate(count2 = cumsum(cumsum(qresult))) %>%
ungroup
-output
# A tibble: 16 × 4
id qresult count count2
<dbl> <dbl> <dbl> <dbl>
1 1 0 0 0
2 1 0 0 0
3 1 0 0 0
4 2 0 0 0
5 2 1 1 1
6 2 0 2 2
7 3 0 0 0
8 3 0 0 0
9 3 1 1 1
10 3 0 2 2
11 4 1 1 1
12 4 0 2 2
13 4 0 3 3
14 4 0 4 4
15 4 0 5 5
16 4 0 6 6

Related

Editing each row in column in R

I have a data frame that looks like this:
Twin_Pair zyg CDsumTwin1 CDsumTwin2
<chr> <int> <dbl> <dbl>
1 pair1(2891,2892) 2 0 5
2 pair2(4000,4001) 1 0 0
3 pair3(4006,4007) 2 0 3
4 pair4(4009,4010) 2 1 3
5 pair5(4012,4013) 2 2 0
6 pair6(4015,4016) 2 0 9
7 pair7(4018,4019) 2 0 0
8 pair8(4021,4022) 1 0 0
9 pair9(4024,4025) 1 0 0
10 pair10(4027,4028) 2 2 17
How can I remove "pair1", "pair2", etc. from each row in the first column such that I am left with something like (4027,4028)? I know how to remove the first 5 characters, but the problem is goes up to pair100. What would be an efficient way to do this?
You need a regex call to identify your pattern. Please test this code to see if it works.
dat$Twin_Pair <- sub("^pair[0-9]+", "", dat$Twin_Pair)
dat
# Twin_Pair zyg CDsumTwin1 CDsumTwin2
# 1 (2891,2892) 2 0 5
# 2 (4000,4001) 1 0 0
# 3 (4006,4007) 2 0 3
# 4 (4009,4010) 2 1 3
# 5 (4012,4013) 2 2 0
# 6 (4015,4016) 2 0 9
# 7 (4018,4019) 2 0 0
# 8 (4021,4022) 1 0 0
# 9 (4024,4025) 1 0 0
# 10 (4027,4028) 2 2 17
Data
dat <- read.table(text = "Twin_Pair zyg CDsumTwin1 CDsumTwin2
1 'pair1(2891,2892)' 2 0 5
2 'pair2(4000,4001)' 1 0 0
3 'pair3(4006,4007)' 2 0 3
4 'pair4(4009,4010)' 2 1 3
5 'pair5(4012,4013)' 2 2 0
6 'pair6(4015,4016)' 2 0 9
7 'pair7(4018,4019)' 2 0 0
8 'pair8(4021,4022)' 1 0 0
9 'pair9(4024,4025)' 1 0 0
10 'pair10(4027,4028)' 2 2 17",
header = TRUE)
An option with trimws
dat$Twin_Pair <- trimws(dat$Twin_Pair, whitespace = "[^(]+", which = 'left')
-output
> dat
Twin_Pair zyg CDsumTwin1 CDsumTwin2
1 (2891,2892) 2 0 5
2 (4000,4001) 1 0 0
3 (4006,4007) 2 0 3
4 (4009,4010) 2 1 3
5 (4012,4013) 2 2 0
6 (4015,4016) 2 0 9
7 (4018,4019) 2 0 0
8 (4021,4022) 1 0 0
9 (4024,4025) 1 0 0
10 (4027,4028) 2 2 17
We could use str_extract with regex '\(.*?\)', that basically extracts everything between parenthesis:
library(stringr)
library(dplyr)
dat %>%
mutate(Twin_Pair = str_extract(Twin_Pair, '\\(.*?\\)'))
Twin_Pair zyg CDsumTwin1 CDsumTwin2
1 (2891,2892) 2 0 5
2 (4000,4001) 1 0 0
3 (4006,4007) 2 0 3
4 (4009,4010) 2 1 3
5 (4012,4013) 2 2 0
6 (4015,4016) 2 0 9
7 (4018,4019) 2 0 0
8 (4021,4022) 1 0 0
9 (4024,4025) 1 0 0
10 (4027,4028) 2 2 17

group data which are either 0 or 1 [duplicate]

This question already has answers here:
Create counter of consecutive runs of a certain value
(4 answers)
Closed 1 year ago.
I have a vector Blinks whose values are either 0 or 1:
df <- data.frame(
Blinks = c(0,0,1,1,1,0,0,1,1,1,1,0,0,1,1)
)
I want to insert a grouping variable for when Blinks == 1. I'm using rleidfor this but the grouping seems to count in the instances where Blinks == 0:
library(dplyr)
library(data.table)
df %>%
mutate(Blinks_grp = ifelse(Blinks > 0, rleid(Blinks), Blinks))
Blinks Blinks_grp
1 0 0
2 0 0
3 1 2
4 1 2
5 1 2
6 0 0
7 0 0
8 1 4
9 1 4
10 1 4
11 1 4
12 0 0
13 0 0
14 1 6
15 1 6
How can I obtain the correct result:
1 0 0
2 0 0
3 1 1
4 1 1
5 1 1
6 0 0
7 0 0
8 1 2
9 1 2
10 1 2
11 1 2
12 0 0
13 0 0
14 1 3
15 1 3
One option could be:
df %>%
mutate(Blinks_grp = with(rle(Blinks), rep(cumsum(values) * values, lengths)))
Blinks Blinks_grp
1 0 0
2 0 0
3 1 1
4 1 1
5 1 1
6 0 0
7 0 0
8 1 2
9 1 2
10 1 2
11 1 2
12 0 0
13 0 0
14 1 3
15 1 3

How can I calculate the percentage score from test results using tidyverse?

Rather than calculate each individuals score, I want to calculate the percentage of individuals who answered the question correctly. Below is the tibble containing the data, the columns are the candidates, a-r, and the rows are the questions. The data points are the answers given, and the column on the right, named 'correct', shows the correct answer.
A tibble: 20 x 19
question a b c d e g h i j k l m n o p q r correct
<chr> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct>
1 001 3 3 3 0 4 0 1 4 4 0 2 3 2 0 3 0 3 1
2 002 2 4 2 3 4 NA 4 2 2 2 4 2 4 3 2 2 3 2
3 003 2 2 2 3 4 2 2 4 4 1 4 3 3 2 4 1 3 2
4 005 2 3 1 3 4 NA 2 4 4 2 4 1 4 2 4 2 2 2
5 006 3 1 2 3 3 NA 2 3 4 2 3 3 3 3 3 NA 3 3
6 008 3 3 3 3 3 1 1 3 3 1 3 3 3 3 3 1 3 3
7 010 4 5 4 3 4 4 4 4 4 3 4 4 5 4 4 3 4 4
8 011 3 3 5 3 3 3 3 3 5 4 5 4 4 3 3 2 5 5
9 013 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0
10 014 0 0 0 2 0 1 0 0 0 0 2 0 2 0 0 0 0 0
11 016 3 3 0 0 4 1 1 4 4 2 3 3 3 3 1 0 3 0
12 017 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0
13 019 0 1 0 2 1 1 0 1 0 1 2 2 2 1 0 1 1 0
14 020 0 0 0 0 0 0 0 0 0 0 1 3 0 0 0 0 0 0
15 039 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0
16 041 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0
17 045 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18 047 0 0 0 0 0 NA 0 0 0 0 1 0 0 0 0 0 0 0
19 049 3 3 3 3 4 NA 2 4 x 2 4 3 5 3 1 1 3 3
20 050 0 3 3 0 1 NA 0 3 3 0 x 0 0 0 0 0 3 1
I would like to generate a column 'percentage' that gives the proportion of correct answers for each question. I suspect I have to do loops or row-wise operations, but I'm so far out of my depth with that, I just can't figure out how to compare factors. I've tried mutate(), if_else(), group_by() and much more but have not managed to get close to an answer.
Any help would be greatly appreciated.
If your data.frame is called data you may try
library(dplyr)
data %>% rowwise() %>%
mutate(percentage = sum(c_across(a:r) == correct) / length(c_across(a:r)))
You can try this solution using a loop:
#Code
#First select the range of individuals a to r
index <- 2:18
#Create empty var to save results
df$Count <- NA
df$Prop <- NA
#Apply function
for(i in 1:dim(df)[1])
{
x <- df[i,index]
count <- length(which(x==df$correct[i]))
percentage <- count/dim(x)[2]
#Assign
df$Count[i] <- count
df$Prop[i] <- percentage
}
Output:
question a b c d e g h i j k l m n o p q r correct Count Prop
1 1 3 3 3 0 4 0 1 4 4 0 2 3 2 0 3 0 3 1 1 0.05882353
2 2 2 4 2 3 4 NA 4 2 2 2 4 2 4 3 2 2 3 2 8 0.47058824
3 3 2 2 2 3 4 2 2 4 4 1 4 3 3 2 4 1 3 2 6 0.35294118
4 5 2 3 1 3 4 NA 2 4 4 2 4 1 4 2 4 2 2 2 6 0.35294118
5 6 3 1 2 3 3 NA 2 3 4 2 3 3 3 3 3 NA 3 3 10 0.58823529
6 8 3 3 3 3 3 1 1 3 3 1 3 3 3 3 3 1 3 3 13 0.76470588
7 10 4 5 4 3 4 4 4 4 4 3 4 4 5 4 4 3 4 4 12 0.70588235
8 11 3 3 5 3 3 3 3 3 5 4 5 4 4 3 3 2 5 5 4 0.23529412
9 13 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 14 0.82352941
10 14 0 0 0 2 0 1 0 0 0 0 2 0 2 0 0 0 0 0 13 0.76470588
11 16 3 3 0 0 4 1 1 4 4 2 3 3 3 3 1 0 3 0 3 0.17647059
12 17 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 15 0.88235294
13 19 0 1 0 2 1 1 0 1 0 1 2 2 2 1 0 1 1 0 5 0.29411765
14 20 0 0 0 0 0 0 0 0 0 0 1 3 0 0 0 0 0 0 15 0.88235294
15 39 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 14 0.82352941
16 41 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 14 0.82352941
17 45 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 1.00000000
18 47 0 0 0 0 0 NA 0 0 0 0 1 0 0 0 0 0 0 0 15 0.88235294
19 49 3 3 3 3 4 NA 2 4 NA 2 4 3 5 3 1 1 3 3 7 0.41176471
20 50 0 3 3 0 1 NA 0 3 3 0 NA 0 0 0 0 0 3 1 1 0.05882353
You had some x in answers so I have replaced by NA in order to make the loop works.

How to create new columns in R every time a given value appears?

I have a question regarding creating new columns if a certain value appears in an existing row.
N=5
T=5
time<-rep(1:T, times=N)
id<- rep(1:N,each=T)
dummy<- c(0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,1,0)
df <- data.frame(id, time, dummy)
id time dummy
1 1 1 0
2 1 2 0
3 1 3 1
4 1 4 1
5 1 5 0
6 2 1 0
7 2 2 0
8 2 3 1
9 2 4 0
10 2 5 0
11 3 1 0
12 3 2 1
13 3 3 0
14 3 4 1
15 3 5 0
16 4 1 0
17 4 2 0
18 4 3 0
19 4 4 0
20 4 5 0
21 5 1 1
22 5 2 0
23 5 3 0
24 5 4 1
25 5 5 0
In this case we have some cross-sections in which more than one 1 appears. Now I try to create a new dummy variable/column for each additional 1. After that, for each dummy, the rows for each cross-section should also be filled with a 1 after the first 1 appears. I can fill the rows by using group_by(id) and the cummax function on each column. But how do I get new variables without going through every cross-section manually? So I want to achieve the following:
id time dummy dummy2
1 1 1 0 0
2 1 2 0 0
3 1 3 1 0
4 1 4 1 1
5 1 5 1 1
6 2 1 0 0
7 2 2 0 0
8 2 3 1 0
9 2 4 1 0
10 2 5 1 0
11 3 1 0 0
12 3 2 1 0
13 3 3 1 0
14 3 4 1 1
15 3 5 1 1
16 4 1 0 0
17 4 2 0 0
18 4 3 0 0
19 4 4 0 0
20 4 5 0 0
21 5 1 1 0
22 5 2 1 0
23 5 3 1 0
24 5 4 1 1
25 5 5 1 1
Thanks! :)
You can use cummax and you would need cumsum to create dummy2
df %>%
group_by(id) %>%
mutate(dummy1 = cummax(dummy), # don't alter 'dummy' here we need it in the next line
dummy2 = cummax(cumsum(dummy) == 2)) %>%
as.data.frame() # needed only to display the entire result
# id time dummy dummy1 dummy2
#1 1 1 0 0 0
#2 1 2 0 0 0
#3 1 3 1 1 0
#4 1 4 1 1 1
#5 1 5 0 1 1
#6 2 1 0 0 0
#7 2 2 0 0 0
#8 2 3 1 1 0
#9 2 4 0 1 0
#10 2 5 0 1 0
#11 3 1 0 0 0
#12 3 2 1 1 0
#13 3 3 0 1 0
#14 3 4 1 1 1
#15 3 5 0 1 1
#16 4 1 0 0 0
#17 4 2 0 0 0
#18 4 3 0 0 0
#19 4 4 0 0 0
#20 4 5 0 0 0
#21 5 1 1 1 0
#22 5 2 0 1 0
#23 5 3 0 1 0
#24 5 4 1 1 1
#25 5 5 0 1 1

How to count number of particular values

My data looks like this:
ID CO MV
1 0 1
1 5 0
1 0 1
1 9 0
1 8 0
1 0 1
2 69 0
2 0 1
2 8 0
2 0 1
2 78 0
2 53 0
2 0 1
2 3 0
3 54 0
3 0 1
3 8 0
3 90 0
3 0 1
3 56 0
4 0 1
4 56 0
4 0 1
4 45 0
4 0 1
4 34 0
4 31 0
4 0 1
4 45 0
5 0 1
5 0 1
5 67 0
I want it to look like this:
ID CO MV CONUM
1 0 1 3
1 5 0 3
1 0 1 3
1 9 0 3
1 8 0 3
1 0 1 3
2 69 0 5
2 0 1 5
2 8 0 5
2 0 1 5
2 78 0 5
2 53 0 5
2 0 1 5
2 3 0 5
3 54 0 4
3 0 1 4
3 8 0 4
3 90 0 4
3 0 1 4
3 56 0 4
4 0 1 5
4 56 0 5
4 0 1 5
4 45 0 5
4 0 1 5
4 34 0 5
4 31 0 5
4 0 1 5
4 45 0 5
5 0 1 1
5 0 1 1
5 67 0 1
I want to create a column CONUM which is the total number of values other than zero in the CO column for each value in the ID column. So for example the CO column for ID 1 has 3 values other than zero, therefore the corresponding values in CONUM column is 3. The MV column is 0 if CO column has a value and 1 if CO column is 0. So another way to accomplish creating the CONUM column would be to count the number of zeros per ID . It would be great if you could help me with the r code to accomplish this. Thanks.
Here is an option with data.table
library(data.table)
setDT(df)[,CONUM:=sum(CO!=0) ,ID][]
You can use ave in base R:
dat <- transform(dat, CONUM = ave(as.logical(CO), ID, FUN = sum))
and an option with dplyr
# install.packages("dplyr")
library(dplyr)
dat <- dat %>%
group_by(ID) %>%
mutate(CONUM = sum(CO != 0))

Resources