cumulative counter in dataframe R - r

I have a dataframe with many rows, but the structure looks like this:
year factor
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 1
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 1
18 0
19 0
20 0
I would need to add a counter as a third column. It should count the cumulative cells that contains zero until it set again to zero once the value 1 is encountered. The result should look like this:
year factor count
1 0 0
2 0 1
3 0 2
4 0 3
5 0 4
6 0 5
7 0 6
8 0 7
9 1 0
10 0 1
11 0 2
12 0 3
13 0 4
14 0 5
15 0 6
16 0 7
17 1 0
18 0 1
19 0 2
20 0 3
I would be glad to do it in a quick way, avoiding loops, since I have to do the operations for hundreds of files.
You can copy my dataframe, pasting the dataframe in "..." here:
dt <- read.table( text="...", , header = TRUE )

Perhaps a solution like this with ave would work for you:
A <- cumsum(dt$factor)
ave(A, A, FUN = seq_along) - 1
# [1] 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3
Original answer:
(Missed that the first value was supposed to be "0". Oops.)
x <- rle(dt$factor == 1)
y <- sequence(x$lengths)
y[dt$factor == 1] <- 0
y
# [1] 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 0 1 2 3

Related

change value of row and subsequent rows depending on row number

I have a dataframe where some rows have values as 0. I want to make a code that makes the next few rows as 0 too.
> head(df$n,n=20)
df$n
1 0
2 9009
3 0
4 0
5 0
6 0
7 0
8 5410
9 0
10 0
11 0
12 0
13 0
14 0
15 32
16 0
17 0
18 1054
19 0
20 0
I want to create a code that converts the next five rows with value 0 as 0.
basically row with 0 is 0 and the next five rows is also 0.
I tried
for(j in 1:nrow(indx)){
for(i in 1:4){
df$n[j+i]<-0
}
}
where indx is dataframe containing all the row number with 0 values.
This works but incorrectly.
How to I get my desired output?
> head(df$n,n=20)
df$n
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 5410
9 0
10 0
11 0
12 0
13 0
14 0
15 32
16 0
17 0
18 0
19 0
20 0
Edit: sorry for the unclear language. My aim is to convert 5 values after 0 to 0. since it is incorrect data.
Edit2: I think this code worked for me. its a little bit primitive.
for( i in 1:nrow(indx)){
u<-indx[i,]
df[u,]<-0
df[u+1,]<-0
df[u+2,]<-0
df[u+3,]<-0
df[u+4,]<-0
df[u+5,]<-0
}
however it introduces extra rows at end but it works.
If I understand correctly, you want to make sure any run of zeros is at least five rows long, unless it's at the end of the data. Here's a dplyr-based solution:
library(dplyr)
df %>%
group_by(zero_run = cumsum(n == 0 & lag(n, default = 1) != 0)) %>%
mutate(
zeros_consecutive = row_number(),
n_new = ifelse(zero_run == 0 | zeros_consecutive > 5, n, 0)
) %>%
ungroup()
# # A tibble: 20 × 4
# n zero_run zeros_consecutive n_new
# <dbl> <int> <int> <dbl>
# 1 0 1 1 0
# 2 9009 1 2 0
# 3 0 2 1 0
# 4 0 2 2 0
# 5 0 2 3 0
# 6 0 2 4 0
# 7 0 2 5 0
# 8 5410 2 6 5410
# 9 0 3 1 0
# 10 0 3 2 0
# 11 0 3 3 0
# 12 0 3 4 0
# 13 0 3 5 0
# 14 0 3 6 0
# 15 32 3 7 32
# 16 0 4 1 0
# 17 0 4 2 0
# 18 1054 4 3 0
# 19 0 5 1 0
# 20 0 5 2 0
I left in the helper columns to better demonstrate the approach, but you could remove these by using n = ifelse(...) instead of n_new = ifelse(...) and adding select(!zeros_run:zeros_consecutive).

group data which are either 0 or 1 [duplicate]

This question already has answers here:
Create counter of consecutive runs of a certain value
(4 answers)
Closed 1 year ago.
I have a vector Blinks whose values are either 0 or 1:
df <- data.frame(
Blinks = c(0,0,1,1,1,0,0,1,1,1,1,0,0,1,1)
)
I want to insert a grouping variable for when Blinks == 1. I'm using rleidfor this but the grouping seems to count in the instances where Blinks == 0:
library(dplyr)
library(data.table)
df %>%
mutate(Blinks_grp = ifelse(Blinks > 0, rleid(Blinks), Blinks))
Blinks Blinks_grp
1 0 0
2 0 0
3 1 2
4 1 2
5 1 2
6 0 0
7 0 0
8 1 4
9 1 4
10 1 4
11 1 4
12 0 0
13 0 0
14 1 6
15 1 6
How can I obtain the correct result:
1 0 0
2 0 0
3 1 1
4 1 1
5 1 1
6 0 0
7 0 0
8 1 2
9 1 2
10 1 2
11 1 2
12 0 0
13 0 0
14 1 3
15 1 3
One option could be:
df %>%
mutate(Blinks_grp = with(rle(Blinks), rep(cumsum(values) * values, lengths)))
Blinks Blinks_grp
1 0 0
2 0 0
3 1 1
4 1 1
5 1 1
6 0 0
7 0 0
8 1 2
9 1 2
10 1 2
11 1 2
12 0 0
13 0 0
14 1 3
15 1 3

formatting table/matrix in R

I am trying to use a package where the table they've used is in a certain format, I am very new to R and don't know how to get my data in this same format to be able to use the package.
Their table looks like this:
Recipient
Actor 1 10 11 12 2 3 4 5 6 7 8 9
1 0 0 0 1 3 1 1 2 3 0 2 6
10 1 0 0 1 0 0 0 0 0 0 0 0
11 13 5 0 5 3 8 0 1 3 2 2 9
12 0 0 2 0 1 1 1 3 1 1 3 0
2 0 0 2 0 0 1 0 0 0 2 2 1
3 9 9 0 5 16 0 2 8 21 45 13 6
4 21 28 64 22 40 79 0 16 53 76 43 38
5 2 0 0 0 0 0 1 0 3 0 0 1
6 11 22 4 21 13 9 2 3 0 4 39 8
7 5 32 11 9 16 1 0 4 33 0 17 22
8 4 0 2 0 1 11 0 0 0 1 0 1
9 0 0 3 1 0 0 1 0 0 0 0 0
Where mine at the moment is:
X0 X1 X2 X3 X4 X5
0 0 2 3 3 0 0
1 1 0 4 2 0 0
2 0 0 0 0 0 0
3 0 2 2 0 1 0
4 0 0 3 2 0 2
5 0 0 3 3 1 0
I would like to add the recipient and actor to mine, as well as change to row and column names to 1, ..., 6.
Also my data is listed under Data in my Workspace and it says:
'num' [1:6,1:6] 0 1 ...
Whereas the example data in the workspace is shown in Values as:
'table' num [1:12,1:12] 0 1 13 ...
Please let me know if you have suggestion to get my data in the same type and style as theirs, all help is greatly appreciated!
OK, so you have a matrix like so:
m <- matrix(c(1:9), 3)
rownames(m) <- 0:2
colnames(m) <- paste0("X", 0:2)
# X0 X1 X2
#0 1 4 7
#1 2 5 8
#2 3 6 9
First you need to remove the Xs and turn it into a table:
colnames(m) <- sub("X", "", colnames(m))
m <- as.table(m)
# 0 1 2
#0 1 4 7
#1 2 5 8
#2 3 6 9
Then you can set the dimension names:
names(dimnames(m)) <- c("Actor", "Recipient")
# Recipient
#Actor 0 1 2
# 0 1 4 7
# 1 2 5 8
# 2 3 6 9
However, usually you would create the contingency table from raw data using the table function, which would automatically return a table object. So, maybe you should fix the step creating your matrix?

How to count number of particular values

My data looks like this:
ID CO MV
1 0 1
1 5 0
1 0 1
1 9 0
1 8 0
1 0 1
2 69 0
2 0 1
2 8 0
2 0 1
2 78 0
2 53 0
2 0 1
2 3 0
3 54 0
3 0 1
3 8 0
3 90 0
3 0 1
3 56 0
4 0 1
4 56 0
4 0 1
4 45 0
4 0 1
4 34 0
4 31 0
4 0 1
4 45 0
5 0 1
5 0 1
5 67 0
I want it to look like this:
ID CO MV CONUM
1 0 1 3
1 5 0 3
1 0 1 3
1 9 0 3
1 8 0 3
1 0 1 3
2 69 0 5
2 0 1 5
2 8 0 5
2 0 1 5
2 78 0 5
2 53 0 5
2 0 1 5
2 3 0 5
3 54 0 4
3 0 1 4
3 8 0 4
3 90 0 4
3 0 1 4
3 56 0 4
4 0 1 5
4 56 0 5
4 0 1 5
4 45 0 5
4 0 1 5
4 34 0 5
4 31 0 5
4 0 1 5
4 45 0 5
5 0 1 1
5 0 1 1
5 67 0 1
I want to create a column CONUM which is the total number of values other than zero in the CO column for each value in the ID column. So for example the CO column for ID 1 has 3 values other than zero, therefore the corresponding values in CONUM column is 3. The MV column is 0 if CO column has a value and 1 if CO column is 0. So another way to accomplish creating the CONUM column would be to count the number of zeros per ID . It would be great if you could help me with the r code to accomplish this. Thanks.
Here is an option with data.table
library(data.table)
setDT(df)[,CONUM:=sum(CO!=0) ,ID][]
You can use ave in base R:
dat <- transform(dat, CONUM = ave(as.logical(CO), ID, FUN = sum))
and an option with dplyr
# install.packages("dplyr")
library(dplyr)
dat <- dat %>%
group_by(ID) %>%
mutate(CONUM = sum(CO != 0))

Removing the unordered pairs repeated twice in a file in R

I have a file like this in R.
**0 1**
0 2
**0 3**
0 4
0 5
0 6
0 7
0 8
0 9
0 10
**1 0**
1 11
1 12
1 13
1 14
1 15
1 16
1 17
1 18
1 19
**3 0**
As we can see, there are similar unordered pairs in this ( marked pairs ), like,
1 0
and
0 1
I wish to remove these pairs. And I want to count the number of such pairs that I have and append the count in front of the tow that is repeated. If not repeated, then 1 should be written in the third column.
For example ( A sample of the output file )
0 1 2
0 2 1
0 3 2
0 4 1
0 5 1
0 6 1
0 7 1
0 8 1
0 9 1
0 10 1
1 11 1
1 12 1
1 13 1
1 14 1
1 15 1
1 16 1
1 17 1
1 18 1
1 19 1
How can I achieve it in R?
Here is a way using transform, pmin and pmax to reorder the data by row, and then aggregate to provide a count:
# data
x <- data.frame(a=c(rep(0,10),rep(1,10),3),b=c(1:10,0,11:19,0))
#logic
aggregate(count~a+b,transform(x,a=pmin(a,b), b=pmax(a,b), count=1),sum)
a b count
1 0 1 2
2 0 2 1
3 0 3 2
4 0 4 1
5 0 5 1
6 0 6 1
7 0 7 1
8 0 8 1
9 0 9 1
10 0 10 1
11 1 11 1
12 1 12 1
13 1 13 1
14 1 14 1
15 1 15 1
16 1 16 1
17 1 17 1
18 1 18 1
19 1 19 1
Here's one approach:
First, create a vector of the columns sorted and then pasted together.
x <- apply(mydf, 1, function(x) paste(sort(x), collapse = " "))
Then, use ave to create the counts you are looking for.
mydf$count <- ave(x, x, FUN = length)
Finally, you can use the "x" vector again, this time to detect and remove duplicated values.
mydf[!duplicated(x), ]
# V1 V2 count
# 1 0 1 2
# 2 0 2 1
# 3 0 3 2
# 4 0 4 1
# 5 0 5 1
# 6 0 6 1
# 7 0 7 1
# 8 0 8 1
# 9 0 9 1
# 10 0 10 1
# 12 1 11 1
# 13 1 12 1
# 14 1 13 1
# 15 1 14 1
# 16 1 15 1
# 17 1 16 1
# 18 1 17 1
# 19 1 18 1
# 20 1 19 1

Resources