group data which are either 0 or 1 [duplicate]

group data which are either 0 or 1 [duplicate] - r

This question already has answers here:
Create counter of consecutive runs of a certain value
(4 answers)
Closed 1 year ago.
I have a vector Blinks whose values are either 0 or 1:
df <- data.frame(
Blinks = c(0,0,1,1,1,0,0,1,1,1,1,0,0,1,1)
)
I want to insert a grouping variable for when Blinks == 1. I'm using rleidfor this but the grouping seems to count in the instances where Blinks == 0:
library(dplyr)
library(data.table)
df %>%
mutate(Blinks_grp = ifelse(Blinks > 0, rleid(Blinks), Blinks))
Blinks Blinks_grp
1 0 0
2 0 0
3 1 2
4 1 2
5 1 2
6 0 0
7 0 0
8 1 4
9 1 4
10 1 4
11 1 4
12 0 0
13 0 0
14 1 6
15 1 6
How can I obtain the correct result:
1 0 0
2 0 0
3 1 1
4 1 1
5 1 1
6 0 0
7 0 0
8 1 2
9 1 2
10 1 2
11 1 2
12 0 0
13 0 0
14 1 3
15 1 3

One option could be:
df %>%
mutate(Blinks_grp = with(rle(Blinks), rep(cumsum(values) * values, lengths)))
Blinks Blinks_grp
1 0 0
2 0 0
3 1 1
4 1 1
5 1 1
6 0 0
7 0 0
8 1 2
9 1 2
10 1 2
11 1 2
12 0 0
13 0 0
14 1 3
15 1 3

Related

R How to count by group starting when condition is met

df <- data.frame (id = c(1,1,1,2,2,2,3,3,3,3,4,4,4,4,4,4),
qresult=c(0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0),
count=c(0,0,0,0,1,2,0,0,1,2,1,2,3,4,5,6))
> df
id qresult count
1 1 0 0
2 1 0 0
3 1 0 0
4 2 0 0
5 2 1 1
6 2 0 2
7 3 0 0
8 3 0 0
9 3 1 1
10 3 0 2
11 4 1 1
12 4 0 2
13 4 0 3
14 4 0 4
15 4 0 5
16 4 0 6
What would be a way to obtain the count column which begins counting when the condition, q_result==1 is met and resets for each new id?

We could wrap with double cumsum on a logical vector after grouping
library(dplyr)
df %>%
group_by(id) %>%
mutate(count2 = cumsum(cumsum(qresult))) %>%
ungroup
-output
# A tibble: 16 × 4
id qresult count count2
<dbl> <dbl> <dbl> <dbl>
1 1 0 0 0
2 1 0 0 0
3 1 0 0 0
4 2 0 0 0
5 2 1 1 1
6 2 0 2 2
7 3 0 0 0
8 3 0 0 0
9 3 1 1 1
10 3 0 2 2
11 4 1 1 1
12 4 0 2 2
13 4 0 3 3
14 4 0 4 4
15 4 0 5 5
16 4 0 6 6

Editing each row in column in R

I have a data frame that looks like this:
Twin_Pair zyg CDsumTwin1 CDsumTwin2
<chr> <int> <dbl> <dbl>
1 pair1(2891,2892) 2 0 5
2 pair2(4000,4001) 1 0 0
3 pair3(4006,4007) 2 0 3
4 pair4(4009,4010) 2 1 3
5 pair5(4012,4013) 2 2 0
6 pair6(4015,4016) 2 0 9
7 pair7(4018,4019) 2 0 0
8 pair8(4021,4022) 1 0 0
9 pair9(4024,4025) 1 0 0
10 pair10(4027,4028) 2 2 17
How can I remove "pair1", "pair2", etc. from each row in the first column such that I am left with something like (4027,4028)? I know how to remove the first 5 characters, but the problem is goes up to pair100. What would be an efficient way to do this?

You need a regex call to identify your pattern. Please test this code to see if it works.
dat$Twin_Pair <- sub("^pair[0-9]+", "", dat$Twin_Pair)
dat
# Twin_Pair zyg CDsumTwin1 CDsumTwin2
# 1 (2891,2892) 2 0 5
# 2 (4000,4001) 1 0 0
# 3 (4006,4007) 2 0 3
# 4 (4009,4010) 2 1 3
# 5 (4012,4013) 2 2 0
# 6 (4015,4016) 2 0 9
# 7 (4018,4019) 2 0 0
# 8 (4021,4022) 1 0 0
# 9 (4024,4025) 1 0 0
# 10 (4027,4028) 2 2 17
Data
dat <- read.table(text = "Twin_Pair zyg CDsumTwin1 CDsumTwin2
1 'pair1(2891,2892)' 2 0 5
2 'pair2(4000,4001)' 1 0 0
3 'pair3(4006,4007)' 2 0 3
4 'pair4(4009,4010)' 2 1 3
5 'pair5(4012,4013)' 2 2 0
6 'pair6(4015,4016)' 2 0 9
7 'pair7(4018,4019)' 2 0 0
8 'pair8(4021,4022)' 1 0 0
9 'pair9(4024,4025)' 1 0 0
10 'pair10(4027,4028)' 2 2 17",
header = TRUE)

An option with trimws
dat$Twin_Pair <- trimws(dat$Twin_Pair, whitespace = "[^(]+", which = 'left')
-output
> dat
Twin_Pair zyg CDsumTwin1 CDsumTwin2
1 (2891,2892) 2 0 5
2 (4000,4001) 1 0 0
3 (4006,4007) 2 0 3
4 (4009,4010) 2 1 3
5 (4012,4013) 2 2 0
6 (4015,4016) 2 0 9
7 (4018,4019) 2 0 0
8 (4021,4022) 1 0 0
9 (4024,4025) 1 0 0
10 (4027,4028) 2 2 17

We could use str_extract with regex '\(.*?\)', that basically extracts everything between parenthesis:
library(stringr)
library(dplyr)
dat %>%
mutate(Twin_Pair = str_extract(Twin_Pair, '\\(.*?\\)'))
Twin_Pair zyg CDsumTwin1 CDsumTwin2
1 (2891,2892) 2 0 5
2 (4000,4001) 1 0 0
3 (4006,4007) 2 0 3
4 (4009,4010) 2 1 3
5 (4012,4013) 2 2 0
6 (4015,4016) 2 0 9
7 (4018,4019) 2 0 0
8 (4021,4022) 1 0 0
9 (4024,4025) 1 0 0
10 (4027,4028) 2 2 17

formatting table/matrix in R

I am trying to use a package where the table they've used is in a certain format, I am very new to R and don't know how to get my data in this same format to be able to use the package.
Their table looks like this:
Recipient
Actor 1 10 11 12 2 3 4 5 6 7 8 9
1 0 0 0 1 3 1 1 2 3 0 2 6
10 1 0 0 1 0 0 0 0 0 0 0 0
11 13 5 0 5 3 8 0 1 3 2 2 9
12 0 0 2 0 1 1 1 3 1 1 3 0
2 0 0 2 0 0 1 0 0 0 2 2 1
3 9 9 0 5 16 0 2 8 21 45 13 6
4 21 28 64 22 40 79 0 16 53 76 43 38
5 2 0 0 0 0 0 1 0 3 0 0 1
6 11 22 4 21 13 9 2 3 0 4 39 8
7 5 32 11 9 16 1 0 4 33 0 17 22
8 4 0 2 0 1 11 0 0 0 1 0 1
9 0 0 3 1 0 0 1 0 0 0 0 0
Where mine at the moment is:
X0 X1 X2 X3 X4 X5
0 0 2 3 3 0 0
1 1 0 4 2 0 0
2 0 0 0 0 0 0
3 0 2 2 0 1 0
4 0 0 3 2 0 2
5 0 0 3 3 1 0
I would like to add the recipient and actor to mine, as well as change to row and column names to 1, ..., 6.
Also my data is listed under Data in my Workspace and it says:
'num' [1:6,1:6] 0 1 ...
Whereas the example data in the workspace is shown in Values as:
'table' num [1:12,1:12] 0 1 13 ...
Please let me know if you have suggestion to get my data in the same type and style as theirs, all help is greatly appreciated!

OK, so you have a matrix like so:
m <- matrix(c(1:9), 3)
rownames(m) <- 0:2
colnames(m) <- paste0("X", 0:2)
# X0 X1 X2
#0 1 4 7
#1 2 5 8
#2 3 6 9
First you need to remove the Xs and turn it into a table:
colnames(m) <- sub("X", "", colnames(m))
m <- as.table(m)
# 0 1 2
#0 1 4 7
#1 2 5 8
#2 3 6 9
Then you can set the dimension names:
names(dimnames(m)) <- c("Actor", "Recipient")
# Recipient
#Actor 0 1 2
# 0 1 4 7
# 1 2 5 8
# 2 3 6 9
However, usually you would create the contingency table from raw data using the table function, which would automatically return a table object. So, maybe you should fix the step creating your matrix?

How to create new columns in R every time a given value appears?

I have a question regarding creating new columns if a certain value appears in an existing row.
N=5
T=5
time<-rep(1:T, times=N)
id<- rep(1:N,each=T)
dummy<- c(0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,1,0)
df <- data.frame(id, time, dummy)
id time dummy
1 1 1 0
2 1 2 0
3 1 3 1
4 1 4 1
5 1 5 0
6 2 1 0
7 2 2 0
8 2 3 1
9 2 4 0
10 2 5 0
11 3 1 0
12 3 2 1
13 3 3 0
14 3 4 1
15 3 5 0
16 4 1 0
17 4 2 0
18 4 3 0
19 4 4 0
20 4 5 0
21 5 1 1
22 5 2 0
23 5 3 0
24 5 4 1
25 5 5 0
In this case we have some cross-sections in which more than one 1 appears. Now I try to create a new dummy variable/column for each additional 1. After that, for each dummy, the rows for each cross-section should also be filled with a 1 after the first 1 appears. I can fill the rows by using group_by(id) and the cummax function on each column. But how do I get new variables without going through every cross-section manually? So I want to achieve the following:
id time dummy dummy2
1 1 1 0 0
2 1 2 0 0
3 1 3 1 0
4 1 4 1 1
5 1 5 1 1
6 2 1 0 0
7 2 2 0 0
8 2 3 1 0
9 2 4 1 0
10 2 5 1 0
11 3 1 0 0
12 3 2 1 0
13 3 3 1 0
14 3 4 1 1
15 3 5 1 1
16 4 1 0 0
17 4 2 0 0
18 4 3 0 0
19 4 4 0 0
20 4 5 0 0
21 5 1 1 0
22 5 2 1 0
23 5 3 1 0
24 5 4 1 1
25 5 5 1 1
Thanks! :)

You can use cummax and you would need cumsum to create dummy2
df %>%
group_by(id) %>%
mutate(dummy1 = cummax(dummy), # don't alter 'dummy' here we need it in the next line
dummy2 = cummax(cumsum(dummy) == 2)) %>%
as.data.frame() # needed only to display the entire result
# id time dummy dummy1 dummy2
#1 1 1 0 0 0
#2 1 2 0 0 0
#3 1 3 1 1 0
#4 1 4 1 1 1
#5 1 5 0 1 1
#6 2 1 0 0 0
#7 2 2 0 0 0
#8 2 3 1 1 0
#9 2 4 0 1 0
#10 2 5 0 1 0
#11 3 1 0 0 0
#12 3 2 1 1 0
#13 3 3 0 1 0
#14 3 4 1 1 1
#15 3 5 0 1 1
#16 4 1 0 0 0
#17 4 2 0 0 0
#18 4 3 0 0 0
#19 4 4 0 0 0
#20 4 5 0 0 0
#21 5 1 1 1 0
#22 5 2 0 1 0
#23 5 3 0 1 0
#24 5 4 1 1 1
#25 5 5 0 1 1

cumulative counter in dataframe R

I have a dataframe with many rows, but the structure looks like this:
year factor
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 1
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 1
18 0
19 0
20 0
I would need to add a counter as a third column. It should count the cumulative cells that contains zero until it set again to zero once the value 1 is encountered. The result should look like this:
year factor count
1 0 0
2 0 1
3 0 2
4 0 3
5 0 4
6 0 5
7 0 6
8 0 7
9 1 0
10 0 1
11 0 2
12 0 3
13 0 4
14 0 5
15 0 6
16 0 7
17 1 0
18 0 1
19 0 2
20 0 3
I would be glad to do it in a quick way, avoiding loops, since I have to do the operations for hundreds of files.
You can copy my dataframe, pasting the dataframe in "..." here:
dt <- read.table( text="...", , header = TRUE )

Perhaps a solution like this with ave would work for you:
A <- cumsum(dt$factor)
ave(A, A, FUN = seq_along) - 1
# [1] 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3
Original answer:
(Missed that the first value was supposed to be "0". Oops.)
x <- rle(dt$factor == 1)
y <- sequence(x$lengths)
y[dt$factor == 1] <- 0
y
# [1] 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 0 1 2 3