Cross Tabulation in R Dataframe - r

I have a dataframe in R:
Subject T O E P Score
1 0 1 0 1 256
2 1 0 1 0 325
2 0 1 0 1 125
3 0 1 0 1 27
4 0 0 0 1 87
5 0 1 0 1 125
6 0 1 1 1 100
This is just a display of the dataframe. In reality, I have a lot of lines for each of the subjects. But the subjects are only from 1 to 6
For each Subject, the possible values are:
T : 0 or 1
O : 0 or 1
E : 0 or 1
P : 0 or 1
Score : Numeric value
I want to create a new dataframe with 6 lines (one for each subject) and the calculated MEAN score for each of these combinations :
T , O , E , P , TO , TE, TP, OE , OP , PE , TOP , TOE , POE , PET
The above will the columns of the new dataframe.
The final output should look like this
Subject T O E P TO TE TP OE OP PE TOP TOE POE PET
1
2
3
4
5
6
For each of these lines x columns the value is the MEAN SCORE
I tried aggregate and table but I can't seem to get what I want
Sorry I am new to R
Thanks

I had to rebuild sample data to answer the question as I understood it, tell me if it works for you :
set.seed(2)
df <- data.frame(subject=sample(1:3,9,T),
T = sample(c(0,1),9,T),
O = sample(c(0,1),9,T),
E = sample(c(0,1),9,T),
P = sample(c(0,1),9,T),
score=round(rnorm(9,10,3)))
# subject T O E P score
# 1 1 1 0 0 1 12
# 2 3 1 0 1 0 9
# 3 2 0 1 0 1 13
# 4 1 1 0 0 0 3
# 5 3 0 1 0 1 14
# 6 3 0 0 1 0 13
# 7 1 1 0 1 0 17
# 8 3 1 0 1 0 12
# 9 2 0 0 1 1 14
cols1 <- c("T","O","E","P")
df$comb <- apply(df[cols1],1,function(x) paste(names(df[cols1])[as.logical(x)],collapse=""))
# subject T O E P score comb
# 1 1 1 0 0 1 12 TP
# 2 3 1 0 1 0 9 TE
# 3 2 0 1 0 1 13 OP
# 4 1 1 0 0 0 3 T
# 5 3 0 1 0 1 14 OP
# 6 3 0 0 1 0 13 E
# 7 1 1 0 1 0 17 TE
# 8 3 1 0 1 0 12 TE
# 9 2 0 0 1 1 14 EP
library(tidyverse)
df %>%
group_by(subject,comb) %>%
summarize(score=mean(score)) %>%
spread(comb,score) %>%
ungroup
# # A tibble: 3 x 7
# subject E EP OP T TE TP
# * <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 NA NA NA 3 17.0 12
# 2 2 NA 14 13 NA NA NA
# 3 3 13 NA 14 NA 10.5 NA
The second step in base R:
means <- aggregate(score ~ subject + comb,df,mean)
means2 <- reshape(means,timevar="comb",idvar="subject",direction="wide")
setNames(means2,c("subject",sort(unique(df$comb))))
# subject E EP OP T TE TP
# 1 3 13 NA 14 NA 10.5 NA
# 2 2 NA 14 13 NA NA NA
# 5 1 NA NA NA 3 17.0 12

I'd do it like this:
# using your table data
df = read.table(text =
"Subject T O E P Score
1 0 1 0 1 256
2 1 0 1 0 325
2 0 1 0 1 125
3 0 1 0 1 27
4 0 0 0 1 87
5 0 1 0 1 125
6 0 1 1 1 100", stringsAsFactors = FALSE, header=TRUE)
# your desired column names
new_names <- c("T", "O", "E", "P", "TO", "TE", "TP", "OE",
"OP", "PE", "TOP", "TOE", "POE", "PET")
# assigning each of your scores to one of the desired column names
assign_comb <- function(dfrow) {
selection <- c("T", "O", "E", "P")[as.logical(dfrow[2:5])]
do.call(paste, as.list(c(selection, sep = "")))
}
df$comb <- apply(df, 1, assign_comb)
# aggregate all the means together
df_agg <- aggregate(df$Score ~ df$comb + df$Subject, FUN = mean)
# reshape the data to wide format
df_new <- reshape(df_agg, v.names = "df$Score", idvar = "df$Subject",
timevar = "df$comb", direction = "wide")
# clean up the column names to match your desired output
# any column names not found will be added as NA
colnames(df_new) <- gsub("df\\$|Score\\.", "", colnames(df_new))
df_new[, new_names[!new_names %in% colnames(df_new)]] <- NA
df_new <- df_new[, c("Subject", new_names)]
With the result:
> df_new
Subject T O E P TO TE TP OE OP PE TOP TOE POE PET
1 1 NA NA NA NA NA NA NA NA 256 NA NA NA NA NA
2 2 NA NA NA NA NA 325 NA NA 125 NA NA NA NA NA
4 3 NA NA NA NA NA NA NA NA 27 NA NA NA NA NA
5 4 NA NA NA 87 NA NA NA NA NA NA NA NA NA NA
6 5 NA NA NA NA NA NA NA NA 125 NA NA NA NA NA
7 6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Related

Find a cumulative sum of one column between instances of value in second column

I have a data frame that looks like this;
df <- data.frame(Trip =c(rep("A",10),rep("B",10)),
State =c(0,0,0,1,1,1,0,0,1,0,0,1,1,0,0,0,1,1,1,0),
Distance = c(0,2,9,4,3,1,4,5,6,3,2,6,1,5,3,3,6,1,8,2),
DistanceToNext = c(NA,NA,NA,3,1,15,NA,NA,NA,NA,NA,1,17,NA,NA,NA,1,8,NA,NA))
Trip State Distance DistanceToNext
1 A 0 1 NA
2 A 0 2 NA
3 A 0 9 NA
4 A 1 4 3
5 A 1 3 1
6 A 1 1 15
7 A 0 4 NA
8 A 0 5 NA
9 A 1 6 NA
10 A 0 3 NA
11 B 0 2 NA
12 B 1 6 1
13 B 1 1 17
14 B 0 5 NA
15 B 0 3 NA
16 B 0 3 NA
17 B 1 6 1
18 B 1 1 8
19 B 1 8 NA
20 B 0 2 NA
The State column indicates whether a fishing boat is fishing (State = 1) or not fishing (State = 0). I want to calculate the Distance travelled between each fishing event (State = 1).
The Distance column indicates the distance between that rows location and the previous row (e.g. it is the lag distance).
The DistanceToNext column is the answer I am trying to generate, it should be NA for all rows in the Trip until the first row where the fishing State = 1. For this row DistanceToNext should equal the sum of the Distance column of subsequent rows until the next fishing State = 1.
For example row 4 is the first fishing event (State = 1) in Trip A, the DistanceToNext cell should be the Distance travelled before the next fishing event which in his case is the very next row (row 5) which has a distance of 3.
For row 5 the next fishing event is again the very next row (row 6) which has a distance of 1. However for row 6 we see that there isn't another fishing event until row 9 so I want a cumulative sum of the d column for the rows between 6 and 9 which is 15.
If it is the last State = 1 row in it's x grouping (A or B) then there isn't another fishing event so there is not distance to calculate so I want it to give NA.
Here is another solution you could use. I also used a custom function for every State/ Distance vectors in each group that results in the desired output:
fn <- function(State, Distance) {
out <- rep(NA, length(State))
inds <- which(State == 1)
for(i in inds) {
if(State[i] == 1 & State[i + 1] == 1) {
out[i] <- Distance[i + 1]
} else if (State[i] == 1 & State[i + 1] == 0 & i != inds[length(inds)]) {
nx <- which(inds == i)
out[i] <- sum(Distance[(i+1):(inds[nx + 1])])
} else {
NA
}
}
out
}
df %>%
group_by(Trip) %>%
mutate(MyDistance = fn(State, Distance))
# A tibble: 20 x 5
# Groups: Trip [2]
Trip State Distance DistanceToNext MyDistance
<chr> <dbl> <dbl> <dbl> <dbl>
1 A 0 0 NA NA
2 A 0 2 NA NA
3 A 0 9 NA NA
4 A 1 4 3 3
5 A 1 3 1 1
6 A 1 1 15 15
7 A 0 4 NA NA
8 A 0 5 NA NA
9 A 1 6 NA NA
10 A 0 3 NA NA
11 B 0 2 NA NA
12 B 1 6 1 1
13 B 1 1 17 17
14 B 0 5 NA NA
15 B 0 3 NA NA
16 B 0 3 NA NA
17 B 1 6 1 1
18 B 1 1 8 8
19 B 1 8 NA NA
20 B 0 2 NA NA
In base R you would do:
fun <- function(df){
a <- which(df$State == 1)
b <- rep(NA, nrow(df))
d <- mapply(function(x, y) sum(df$Distance[(x+1):y]), head(a,-1), tail(a, -1))
b[a] <- c(d, NA)
transform(df, DisttoNext = b)
}
do.call(rbind, by(df, df$Trip, fun))
Trip State Distance DistanceToNext DisttoNext
A.1 A 0 0 NA NA
A.2 A 0 2 NA NA
A.3 A 0 9 NA NA
A.4 A 1 4 3 3
A.5 A 1 3 1 1
A.6 A 1 1 15 15
A.7 A 0 4 NA NA
A.8 A 0 5 NA NA
A.9 A 1 6 NA NA
A.10 A 0 3 NA NA
B.11 B 0 2 NA NA
B.12 B 1 6 1 1
B.13 B 1 1 17 17
B.14 B 0 5 NA NA
B.15 B 0 3 NA NA
B.16 B 0 3 NA NA
B.17 B 1 6 1 1
B.18 B 1 1 8 8
B.19 B 1 8 NA NA
B.20 B 0 2 NA NA
A data.table alternative.
library(data.table)
setDT(df)
df[,`:=`(next_dist = shift(Distance, type = "lead"), g = cumsum(State), ri = .I),
by = Trip]
d = df[ , .(ri = ri[1], State = State[1], s = sum(next_dist)), by = .(Trip, g)]
df[d[State == 1, .SD[-.N], by = Trip], on = .(ri), s := s]
df[ , `:=`(ri = NULL, next_dist = NULL, g = NULL)]
# Trip State Distance DistanceToNext s
# 1: A 0 0 NA NA
# 2: A 0 2 NA NA
# 3: A 0 9 NA NA
# 4: A 1 4 3 3
# 5: A 1 3 1 1
# 6: A 1 1 15 15
# 7: A 0 4 NA NA
# 8: A 0 5 NA NA
# 9: A 1 6 NA NA
# 10: A 0 3 NA NA
# 11: B 0 2 NA NA
# 12: B 1 6 1 1
# 13: B 1 1 17 17
# 14: B 0 5 NA NA
# 15: B 0 3 NA NA
# 16: B 0 3 NA NA
# 17: B 1 6 1 1
# 18: B 1 1 8 8
# 19: B 1 8 NA NA
# 20: B 0 2 NA NA
Explanation:
Convert data to data.table (setDT(df)).
For each 'Trip' (by = Trip), create new variables by reference (:=): next distance (shift(Distance, type = "lead")), a grouping variable which increases everytime 'State' is 1 (cumsum(State)), a row index used to join result (.I; this also could be done first, without the grouping).
For each 'Trip' and 'State group' (by = .(Trip, g)), select first row index (ri[1]), first 'State' (State = State[1]), and sum the lead distances (sum(next_dist)).
From the result above, select rows where 'State' is 1 (State == 1). Then, for each 'Trip' (by = Trip), select the subset of data (.SD) except the last row (-.N). Join to the original data on row index (on = .(ri)). Create a new column, sum of distances, 's' by reference (:=). If desired, remove temp variables.

R Insert Value within Dataframe

I have a very complex problem, i hope someone can help -> i want to copy a row value (i.e. Player 1 or Player 2) into two other rows (for Player 3 and 4) if and only if these players are in the same Treatment, Group and Period AND this player was indeed picked (see column Player.Picked)
I know that with tidyverse I can group_by my columns of interest: Treatment, Group, and Period.
However, I am unsure how to proceed with the condition that Player Picked is fulfilled and then how to extract this value appropriately for the players 3 and 4 in the same treatment, group, period.
The column "extracted.Player 1/2 Value" should be the output. (I have manually provided the first four correct solutions).
Any ideas? Help would be very much appreciated. Thanks a lot in advance!
df
T Player Group Player.Picked Period Player1/2Value extracted.Player1/2Value
1 1 6 1 1 10
1 2 6 1 1 9
1 3 5 2 1 NA -> 4
1 4 6 1 1 NA -> 10
1 5 3 1 1 NA
1 1 5 2 1 8
1 2 1 0 1 7
1 3 6 1 1 NA -> 10
1 4 2 2 1 NA
1 5 2 2 1 NA
1 1 1 0 1 7
1 2 2 2 1 11
1 3 3 1 1 NA
1 4 4 1 1 NA
1 5 4 1 1 NA
1 1 2 2 1 21
1 2 4 1 1 17
1 3 1 0 1 NA
1 4 5 2 1 NA -> 4
1 5 6 1 1 NA
1 1 3 1 1 12
1 2 3 1 1 15
1 3 4 1 1 NA
1 4 1 0 1 NA
1 5 1 0 1 NA
1 1 4 1 1 11
1 2 5 2 1 4
1 3 2 2 1 NA
1 4 3 1 1 NA
1 5 5 2 1 NA
I'm not sure if I understood the required logic; here I'm assuming that Player 5 always picks Player 1 or 2 per Group.
So, here is my go at this using library(data.table):
library(data.table)
DT <- data.table::data.table(
check.names = FALSE,
T = c(1L,1L,1L,
1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,
1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,
1L,1L,1L,1L),
Player = c(1L,2L,3L,
4L,5L,1L,2L,3L,4L,5L,1L,2L,3L,4L,5L,
1L,2L,3L,4L,5L,1L,2L,3L,4L,5L,1L,
2L,3L,4L,5L),
Group = c(6L,6L,5L,
6L,3L,5L,1L,6L,2L,2L,1L,2L,3L,4L,4L,
2L,4L,1L,5L,6L,3L,3L,4L,1L,1L,4L,
5L,2L,3L,5L),
Player.Picked = c(1L,1L,2L,
1L,1L,2L,0L,1L,2L,2L,0L,2L,1L,1L,1L,
2L,1L,0L,2L,1L,1L,1L,1L,1L,0L,0L,
1L,2L,2L,2L),
Period = c(1L,1L,1L,
1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,
1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,
1L,1L,1L,1L),
`Player1/2Value` = c(10L,9L,NA,
NA,NA,8L,7L,NA,NA,NA,7L,11L,NA,NA,
NA,21L,17L,NA,NA,NA,12L,15L,NA,NA,NA,
11L,4L,NA,NA,NA),
`extracted.Player1/2Value` = c(NA,NA,4L,
10L,NA,NA,NA,10L,NA,NA,NA,NA,NA,NA,
NA,NA,NA,NA,4L,NA,NA,NA,NA,NA,NA,NA,
NA,NA,NA,NA)
)
setorderv(DT, cols = c("T", "Group", "Period", "Player"))
Player5PickedDT <- DT[Player == 5, Player.Picked, by = c("T", "Group", "Period")]
setnames(Player5PickedDT, old = "Player.Picked", new = "Player5Picked")
DT <- DT[Player5PickedDT, on = c("T", "Group", "Period")]
extractedDT <- DT[Player == Player5Picked & Player5Picked > 0, `Player1/2Value`, by = c("T", "Group", "Period")]
setnames(extractedDT, old = "Player1/2Value", new = "extractedValue")
DT[, "Player5Picked" := NULL]
DT <- extractedDT[DT, on = c("T", "Group", "Period")]
DT[, extractedValue := fifelse(Player %in% c(3, 4), yes = extractedValue, no = NA_real_)]
setcolorder(DT, c("T", "Group", "Period", "Player", "Player.Picked", "Player1/2Value", "extracted.Player1/2Value", "extractedValue"))
DT
The resulting table differs from your expected result (extracted.Player1/2Value vs extractedValue, but in my eyes it is following the explained logic):
T Group Period Player Player.Picked Player1/2Value extracted.Player1/2Value extractedValue
1: 1 1 1 1 0 7 NA NA
2: 1 1 1 2 0 7 NA NA
3: 1 1 1 3 0 NA NA NA
4: 1 1 1 4 1 NA NA NA
5: 1 1 1 5 0 NA NA NA
6: 1 2 1 1 2 21 NA NA
7: 1 2 1 2 2 11 NA NA
8: 1 2 1 3 2 NA NA 11
9: 1 2 1 4 2 NA NA 11
10: 1 2 1 5 2 NA NA NA
11: 1 3 1 1 1 12 NA NA
12: 1 3 1 2 1 15 NA NA
13: 1 3 1 3 1 NA NA 12
14: 1 3 1 4 2 NA NA 12
15: 1 3 1 5 1 NA NA NA
16: 1 4 1 1 0 11 NA NA
17: 1 4 1 2 1 17 NA NA
18: 1 4 1 3 1 NA NA 11
19: 1 4 1 4 1 NA NA 11
20: 1 4 1 5 1 NA NA NA
21: 1 5 1 1 2 8 NA NA
22: 1 5 1 2 1 4 NA NA
23: 1 5 1 3 2 NA 4 4
24: 1 5 1 4 2 NA 4 4
25: 1 5 1 5 2 NA NA NA
26: 1 6 1 1 1 10 NA NA
27: 1 6 1 2 1 9 NA NA
28: 1 6 1 3 1 NA 10 10
29: 1 6 1 4 1 NA 10 10
30: 1 6 1 5 1 NA NA NA
T Group Period Player Player.Picked Player1/2Value extracted.Player1/2Value extractedValue

R - Reorder NULL columns

I have the next question.
If I have the following data frame db and I want to rearrange the columns so that they the NULL columns stay at the ends (as in db2).
How can I do it dynamically?
Thank you
db<-data.frame(N=c(2,4,6,8),
a=c(1,1,1,1),
b=c(1,1,1,1),
c=c(NA,1,1,1),
d=c(NA,1,1,1),
e=c(NA,NA,1,1),
f=c(NA,NA,1,1),
g=c(NA,NA,NA,1),
h=c(NA,NA,NA,1))
db2<-data.frame(N=c(2,4,6,8),
a=c(NA,NA,NA,1),
b=c(NA,NA,1,1),
c=c(NA,1,1,1),
d=c(1,1,1,1),
e=c(1,1,1,1),
f=c(NA,1,1,1),
g=c(NA,NA,1,1),
h=c(NA,NA,NA,1))
N a b c d e f g h
1 2 NA NA NA 1 1 NA NA NA
2 4 NA NA 1 1 1 1 NA NA
3 6 NA 1 1 1 1 1 1 NA
4 8 1 1 1 1 1 1 1 1
If the number of NAs per row are always even, then loop through the rows, rearrange the NA by appending half the NAs at the start and end
db[-1] <- t(apply(db[-1], 1, function(x) {
i1 <- is.na(x)
if(sum(i1) > 0) setNames(c(rep(NA,sum(i1)/2), x[!i1],
rep(NA, sum(i1)/2)), names(x)) else x}))
db
# N a b c d e f g h
#1 2 NA NA NA 1 1 NA NA NA
#2 4 NA NA 1 1 1 1 NA NA
#3 6 NA 1 1 1 1 1 1 NA
#4 8 1 1 1 1 1 1 1 1

How to format describeBy table in R?

I have this data set:
Defects.I Defects.D Treatment
1 2 A
1 3 B
And I'm trying to do a descriptive statistics for defects detected and isolated, grouped per treatment.
After searching for a while I found a nice function on the psych library called describeBy().
With the following code:
describeBy(myData[1:2],myData$Treatment)
I got this output:
Treatment A
Mean. Median. Trimed.
Defects.I x x x
Defects.D x x x
Treatment B
Mean. Median. Trimed.
Defects.I x x x
Defects.D x x x
But in reality I was looking for something like
Mean. Median. Trimed.
A B A B A B
Defects.I x x x x x x
Defects.D x x x x x x
Data
myData <- structure(list(Defects.I = c(1L, 1L), Defects.D = 2:3, Treatment = c("A",
"B")), .Names = c("Defects.I", "Defects.D", "Treatment"), class = "data.frame", row.names = c(NA,
-2L))
Since describeBy returns a lists of data frames, we could just cbind them all, but that doesn't get the right order. Instead we can interleave the columns
myData <- structure(list(Defects.I = c(1L, 1L), Defects.D = 2:3,
Treatment = c("A", "B")),
.Names = c("Defects.I", "Defects.D", "Treatment"),
class = "data.frame", row.names = c(NA, -2L))
l <- psych::describeBy(myData[1:2], myData$Treatment)
So interleave using this order
order(sequence(c(ncol(l$A), ncol(l$B))))
# [1] 1 14 2 15 3 16 4 17 5 18 6 19 7 20 8 21 9 22 10 23 11 24 12 25 13 26
rather than what cbind alone would do
c(1:13, 1:13)
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13
so this
do.call('cbind', l)[, order(sequence(lengths(l)))]
# A.vars B.vars A.n B.n A.mean B.mean A.sd B.sd A.median B.median A.trimmed B.trimmed A.mad B.mad
# Defects.I 1 1 1 1 1 1 NA NA 1 1 1 1 0 0
# Defects.D 2 2 1 1 2 3 NA NA 2 3 2 3 0 0
# A.min B.min A.max B.max A.range B.range A.skew B.skew A.kurtosis B.kurtosis A.se B.se
# Defects.I 1 1 1 1 0 0 NA NA NA NA NA NA
# Defects.D 2 3 2 3 0 0 NA NA NA NA NA NA
or as a function
interleave <- function(l, how = c('cbind', 'rbind')) {
how <- match.arg(how)
if (how %in% 'rbind')
do.call(how, l)[order(sequence(sapply(l, nrow))), ]
else do.call(how, l)[, order(sequence(sapply(l, ncol))), ]
}
interleave(l)
# A.vars B.vars A.n B.n
# Defects.I 1 1 1 1
# Defects.D 2 2 1 1 ...
# ...
interleave(l, 'r')
# vars n mean sd median trimmed mad min max range skew kurtosis se
# A.Defects.I 1 1 1 NA 1 1 0 1 1 0 NA NA NA
# B.Defects.I 1 1 1 NA 1 1 0 1 1 0 NA NA NA
# A.Defects.D 2 1 2 NA 2 2 0 2 2 0 NA NA NA
# B.Defects.D 2 1 3 NA 3 3 0 3 3 0 NA NA NA
You can try the mat = TRUE argument. It's not exactly what you're looking for, but it's closer:
library(psych)
mydata = data.frame(Defects.I = c(1,1), Defects.D = c(2,3), Treatment = c('A','B'))
describeBy(mydata[1:2], mydata$Treatment, mat = TRUE)
gives
item group1 vars n mean sd median trimmed mad min max range skew kurtosis se
Defects.I1 1 A 1 1 1 NA 1 1 0 1 1 0 NA NA NA
Defects.I2 2 B 1 1 1 NA 1 1 0 1 1 0 NA NA NA
Defects.D1 3 A 2 1 2 NA 2 2 0 2 2 0 NA NA NA
Defects.D2 4 B 2 1 3 NA 3 3 0 3 3 0 NA NA NA

Count consecutive strings of zeroes and ones over multiple groups

There have been several discussions about counting consecutive strings of zeroes and ones (or other values) using functions like rle or cumsum. I have played around with these functions, but I can't easily figure out how to get them to apply to my specific problem.
I am working with ecological presence/absence data ("pres.abs" = 1 or 0) organized by time ("year") and location ("id"). For each location id, I would like to separately calculate the length of consecutive ones and zeroes through time. Where these cannot be calculated, I want to return "NA".
Below is a sample of what the data looks like (first 3 columns) and the output I am hoping to achieve (last 2 columns). Ideally, this would be a pretty fast function avoiding for-loops since the real data frame contains ~15,000 rows.
year = rep(1:10, times=3)
id = c(rep(1, times=10), rep(2, times=10), rep(3, times=10))
pres.abs.id.1 = c(0, 0, 0, 1, 1, 1, 0, 0, 1, 1) #Pres/abs data at site 1 across time
pres.abs.id.2 = c(1, 1, 0, 1, 0, 0, 1, 0, 0, 0) #Pres/abs data at site 2 across time
pres.abs.id.3 = c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1) #Pres/abs data at site 3 across time
pres.abs = c(pres.abs.id.1, pres.abs.id.2, pres.abs.id.3)
dat = data.frame(id, year, pres.abs)
dat$cumul.zeroes = c(1,2,3,NA,NA,NA,1,2,NA,NA,NA,NA,1,NA,1,2,NA,1,2,3,1,2,3,4,5,NA,NA,NA,NA,NA)
dat$cumul.ones = c(NA,NA,NA,1,2,3,NA,NA,1,2,1,2,NA,1,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,1,2,3,4,5)
> dat
id year pres.abs cumul.zeroes cumul.ones
1 1 1 0 1 NA
2 1 2 0 2 NA
3 1 3 0 3 NA
4 1 4 1 NA 1
5 1 5 1 NA 2
6 1 6 1 NA 3
7 1 7 0 1 NA
8 1 8 0 2 NA
9 1 9 1 NA 1
10 1 10 1 NA 2
11 2 1 1 NA 1
12 2 2 1 NA 2
13 2 3 0 1 NA
14 2 4 1 NA 1
15 2 5 0 1 NA
16 2 6 0 2 NA
17 2 7 1 NA 1
18 2 8 0 1 NA
19 2 9 0 2 NA
20 2 10 0 3 NA
21 3 1 0 1 NA
22 3 2 0 2 NA
23 3 3 0 3 NA
24 3 4 0 4 NA
25 3 5 0 5 NA
26 3 6 1 NA 1
27 3 7 1 NA 2
28 3 8 1 NA 3
29 3 9 1 NA 4
30 3 10 1 NA 5
Thanks very much for your help.
Here's a base R way using rle and sequence:
dat <- within(dat, {
cumul.counts <- unlist(lapply(split(pres.abs, id), function(x) sequence(rle(x)$lengths)))
cumul.zeroes <- replace(cumul.counts, pres.abs == 1, NA)
cumul.ones <- replace(cumul.counts, pres.abs == 0, NA)
rm(cumul.counts)
})
# id year pres.abs cumul.ones cumul.zeroes
# 1 1 1 0 NA 1
# 2 1 2 0 NA 2
# 3 1 3 0 NA 3
# 4 1 4 1 1 NA
# 5 1 5 1 2 NA
# 6 1 6 1 3 NA
# 7 1 7 0 NA 1
# 8 1 8 0 NA 2
# 9 1 9 1 1 NA
# 10 1 10 1 2 NA
# 11 2 1 1 1 NA
# 12 2 2 1 2 NA
# 13 2 3 0 NA 1
# 14 2 4 1 1 NA
# 15 2 5 0 NA 1
# 16 2 6 0 NA 2
# 17 2 7 1 1 NA
# 18 2 8 0 NA 1
# 19 2 9 0 NA 2
# 20 2 10 0 NA 3
# 21 3 1 0 NA 1
# 22 3 2 0 NA 2
# 23 3 3 0 NA 3
# 24 3 4 0 NA 4
# 25 3 5 0 NA 5
# 26 3 6 1 1 NA
# 27 3 7 1 2 NA
# 28 3 8 1 3 NA
# 29 3 9 1 4 NA
# 30 3 10 1 5 NA
Here's one option with dplyr:
require(dplyr)
dat %>%
group_by(id, x = cumsum(c(0,diff(pres.abs)) != 0)) %>%
mutate(cumul.zeros = ifelse(pres.abs, NA_integer_, row_number()),
cumul.ones = ifelse(!pres.abs, NA_integer_, row_number())) %>%
ungroup() %>% select(-x)
#Source: local data frame [30 x 5]
#
# id year pres.abs cumul.zeros cumul.ones
#1 1 1 0 1 NA
#2 1 2 0 2 NA
#3 1 3 0 3 NA
#4 1 4 1 NA 1
#5 1 5 1 NA 2
#6 1 6 1 NA 3
#7 1 7 0 1 NA
#8 1 8 0 2 NA
#9 1 9 1 NA 1
#10 1 10 1 NA 2
#11 2 1 1 NA 1
#12 2 2 1 NA 2
#13 2 3 0 1 NA
#14 2 4 1 NA 1
#15 2 5 0 1 NA
#16 2 6 0 2 NA
#17 2 7 1 NA 1
#18 2 8 0 1 NA
#19 2 9 0 2 NA
#20 2 10 0 3 NA
#21 3 1 0 1 NA
#22 3 2 0 2 NA
#23 3 3 0 3 NA
#24 3 4 0 4 NA
#25 3 5 0 5 NA
#26 3 6 1 NA 1
#27 3 7 1 NA 2
#28 3 8 1 NA 3
#29 3 9 1 NA 4
#30 3 10 1 NA 5

Resources