Here is a part of the sample data :
dat<-read.table (text=" ID Time B1 T1 Q1 W1 M1
1 12 12 0 12 11 9
1 13 0 1 NA NA NA
2 10 12 0 6 7 8
2 14 0 1 NA NA NA
1 16 16A 0 1 2 4
1 14 0 1 NA NA NA
2 14 16A 0 5 6 7
2 7 0 1 NA NA NA
1 7 20 0 5 8 0
1 7 0 1 NA NA NA
2 9 20 0 7 8 1
2 9 0 1 NA NA NA
", header=TRUE)
I want to update value 1 In column T1 for repeated IDs. For the first repeated IDs, should be a value of 1, and for the second repeated IDs, a value of 2; and for the third repeated IDs, a value of 3 and so on. I also want to replace NA with blank cells. here is the expected outcome:
ID Time B1 T1 Q1 W1 M1
1 12 12 0 12 11 9
1 13 0 1
2 10 12 0 6 7 8
2 14 0 1
1 16 16A 0 1 2 4
1 14 0 2
2 14 16A 0 5 6 7
2 7 0 2
1 7 20 0 5 8 0
1 7 0 3
2 9 20 0 7 8 1
2 9 0 3
You could use an ifelse across with cumsum per group like this:
library(dplyr)
dat %>%
group_by(ID, B1) %>%
mutate(across(T1, ~ ifelse(.x == 1, cumsum(.x), T1)))
#> # A tibble: 12 × 7
#> # Groups: ID, B1 [8]
#> ID Time B1 T1 Q1 W1 M1
#> <int> <int> <chr> <int> <int> <int> <int>
#> 1 1 12 12 0 12 11 9
#> 2 1 13 0 1 NA NA NA
#> 3 2 10 12 0 6 7 8
#> 4 2 14 0 1 NA NA NA
#> 5 1 16 16A 0 1 2 4
#> 6 1 14 0 2 NA NA NA
#> 7 2 14 16A 0 5 6 7
#> 8 2 7 0 2 NA NA NA
#> 9 1 7 20 0 5 8 0
#> 10 1 7 0 3 NA NA NA
#> 11 2 9 20 0 7 8 1
#> 12 2 9 0 3 NA NA NA
Created on 2023-01-14 with reprex v2.0.2
With data.table
library(data.table)
setDT(dat)[T1 ==1, T1 := cumsum(T1), .(ID, B1)]
-output
> dat
ID Time B1 T1 Q1 W1 M1
1: 1 12 12 0 12 11 9
2: 1 13 0 1 NA NA NA
3: 2 10 12 0 6 7 8
4: 2 14 0 1 NA NA NA
5: 1 16 16A 0 1 2 4
6: 1 14 0 2 NA NA NA
7: 2 14 16A 0 5 6 7
8: 2 7 0 2 NA NA NA
9: 1 7 20 0 5 8 0
10: 1 7 0 3 NA NA NA
11: 2 9 20 0 7 8 1
12: 2 9 0 3 NA NA NA
Related
A similar to my data is:
dat1<-read.table (text=" ID Rat Garden Class Time1 Time2 Time3
1 12 12 0 15 16 20
1 13 0 1 NA NA NA
2 13 11 0 18 12 16
2 9 0 1 NA NA NA
1 6 13 0 17 14 14
1 7 0 2 NA NA NA
2 4 14 0 17 16 12
2 3 0 2 NA NA NA
", header=TRUE)
dat2<-read.table (text=" ID Value1 Value2
1 6 7
2 5 4
", header=TRUE)
I want to insert the values of dat2 to dat1 in the Time1 column. In front of numbers 1 and 2 in the class column.
I get the following outcome.
ID Rat Garden Class Time1 Time2 Time3
1 12 12 0 15 16 20
1 13 0 1 6
2 13 11 0 18 12 16
2 9 0 1 5
1 6 13 0 17 14 14
1 7 0 2 7
2 4 14 0 17 16 12
2 3 0 2 4
We may group by 'ID', and replace the 'Time1' where the NA values occur with the unlisted 'dat2' 'Value' columns where the ID matches
library(dplyr)
dat1 %>%
group_by(ID) %>%
mutate(Time1 = replace(Time1, is.na(Time1),
unlist(dat2[-1][dat2$ID == cur_group()$ID,]))) %>%
ungroup
-output
# A tibble: 8 × 7
ID Rat Garden Class Time1 Time2 Time3
<int> <int> <int> <int> <int> <int> <int>
1 1 12 12 0 15 16 20
2 1 13 0 1 6 NA NA
3 2 13 11 0 18 12 16
4 2 9 0 1 5 NA NA
5 1 6 13 0 17 14 14
6 1 7 0 2 7 NA NA
7 2 4 14 0 17 16 12
8 2 3 0 2 4 NA NA
Here is a wild ride:
First we pull the values as a vector from dat2.
Then we put alternating an NA into the vector until it gets column length of dat1 and
finally we use coalesce after cbind:
library(dplyr)
library(tidyr)
vector <- dat2 %>%
pivot_longer(-ID) %>%
arrange(name) %>%
pull(value)
col_x <- c(sapply(vector, c, rep(NA, 1)))
cbind(dat1, col_x) %>%
mutate(col_x = lag(col_x)) %>%
mutate(Time1= coalesce(Time1, col_x), .keep="unused")
ID Rat Garden Class Time1 Time2 Time3
1 1 12 12 0 15 16 20
2 1 13 0 1 6 NA NA
3 2 13 11 0 18 12 16
4 2 9 0 1 5 NA NA
5 1 6 13 0 17 14 14
6 1 7 0 2 7 NA NA
7 2 4 14 0 17 16 12
8 2 3 0 2 4 NA NA
I would like to filter to remove all rows before a particular value in a specific column. For example, in the data frame below, I would like to remove all rows before "1" that appears in column x, for as much as "1" occurs. Please note that the value of "1" repeats many times and I want to remove the "NA" rows before the "1" in column x, regarding column a.
Thanks
a b x
1 1 NA
1 2 NA
1 3 1
1 4 0
1 5 0
1 6 NA
1 7 NA
2 1 NA
2 2 NA
2 3 1
2 4 NA
2 5 0
2 6 0
2 7 NA
3 1 NA
3 2 NA
3 3 NA
3 4 NA
3 5 1
3 6 0
3 7 NA
the desired output would be like this:
a b x
1 3 1
1 4 0
1 5 0
1 6 NA
1 7 NA
2 3 1
2 4 NA
2 5 0
2 6 0
2 7 NA
3 5 1
3 6 0
3 7 NA
Does this solve your problem?
library(tidyverse)
dat <- read.table(text = "a b x
1 1 NA
1 2 NA
1 3 1
1 4 0
1 5 0
1 6 NA
1 7 NA
2 1 NA
2 2 NA
2 3 1
2 4 NA
2 5 0
2 6 0
2 7 NA
3 1 NA
3 2 NA
3 3 NA
3 4 NA
3 5 1
3 6 0
3 7 NA", header = TRUE)
dat %>%
group_by(a) %>%
filter(cummax(!is.na(x)) == 1)
#> # A tibble: 13 × 3
#> # Groups: a [3]
#> a b x
#> <int> <int> <int>
#> 1 1 3 1
#> 2 1 4 0
#> 3 1 5 0
#> 4 1 6 NA
#> 5 1 7 NA
#> 6 2 3 1
#> 7 2 4 NA
#> 8 2 5 0
#> 9 2 6 0
#> 10 2 7 NA
#> 11 3 5 1
#> 12 3 6 0
#> 13 3 7 NA
Created on 2021-12-07 by the reprex package (v2.0.1)
Here is the code for my example dataset.
df = data.frame("group" =c(rep(1,5),rep(1,6),rep(2,4),rep(2,3)), "time" = c(rep(NA,5),seq(1,6),rep(NA,4),seq(1,3)), "p" = seq(1,18) )
group time p
1 1 NA 1
2 1 NA 2
3 1 NA 3
4 1 NA 4
5 1 NA 5
6 1 1 6
7 1 2 7
8 1 3 8
9 1 4 9
10 1 5 10
11 1 6 11
12 2 NA 12
13 2 NA 13
14 2 NA 14
15 2 NA 15
16 2 1 16
17 2 2 17
18 2 3 18
I would like to figure out how to apply a function by group to only the values that have time then append the result as a new column in the data frame. Here is my example function I would like to apply.
pfunc <- function(p){
p+5
}
The output I am hoping to obtain would look as follows.
group time p new_p
1 1 NA 1 NA
2 1 NA 2 NA
3 1 NA 3 NA
4 1 NA 4 NA
5 1 NA 5 NA
6 1 1 6 11
7 1 2 7 12
8 1 3 8 13
9 1 4 9 14
10 1 5 10 15
11 1 6 11 16
12 2 NA 12 NA
13 2 NA 13 NA
14 2 NA 14 NA
15 2 NA 15 NA
16 2 1 16 21
17 2 2 17 22
18 2 3 18 23
You can try this:
library(dplyr)
df %>% group_by(group) %>%
mutate(pnew=ifelse(is.na(time),time,time+5))
# A tibble: 18 x 4
# Groups: group [2]
group time p pnew
<dbl> <int> <int> <dbl>
1 1 NA 1 NA
2 1 NA 2 NA
3 1 NA 3 NA
4 1 NA 4 NA
5 1 NA 5 NA
6 1 1 6 6
7 1 2 7 7
8 1 3 8 8
9 1 4 9 9
10 1 5 10 10
11 1 6 11 11
12 2 NA 12 NA
13 2 NA 13 NA
14 2 NA 14 NA
15 2 NA 15 NA
16 2 1 16 6
17 2 2 17 7
18 2 3 18 8
Update
You can use this function:
increase <- function(data,n)
{
data %>% group_by(group) %>%
mutate(pnew=ifelse(is.na(time),time,time+n)) -> result
return(result)
}
increase(df,n = 10)
# A tibble: 18 x 4
# Groups: group [2]
group time p pnew
<dbl> <int> <int> <dbl>
1 1 NA 1 NA
2 1 NA 2 NA
3 1 NA 3 NA
4 1 NA 4 NA
5 1 NA 5 NA
6 1 1 6 11
7 1 2 7 12
8 1 3 8 13
9 1 4 9 14
10 1 5 10 15
11 1 6 11 16
12 2 NA 12 NA
13 2 NA 13 NA
14 2 NA 14 NA
15 2 NA 15 NA
16 2 1 16 11
17 2 2 17 12
18 2 3 18 13
Update 2
I hope this helps:
df %>% group_by(group) %>% rowwise() %>% mutate(pnew=ifelse(is.na(time),NA,pfunc(time)))
# A tibble: 18 x 4
# Rowwise: group
group time p pnew
<dbl> <int> <int> <dbl>
1 1 NA 1 NA
2 1 NA 2 NA
3 1 NA 3 NA
4 1 NA 4 NA
5 1 NA 5 NA
6 1 1 6 6
7 1 2 7 7
8 1 3 8 8
9 1 4 9 9
10 1 5 10 10
11 1 6 11 11
12 2 NA 12 NA
13 2 NA 13 NA
14 2 NA 14 NA
15 2 NA 15 NA
16 2 1 16 6
17 2 2 17 7
18 2 3 18 8
I'm trying to assign NAs to the first two rows of each event, with the following conditional statement:
If the first day of each event has a value of "variable" = 0, check the day before. If the day before (last day of previous event) has a "variable" > 0, then assign NAs to the first two rows of the event having "variable" = 0 on the first day. If the day before has a "variable" = 0, do nothing.
Here is an example:
day <- c(1:16)
event<- c(1,1,2,3,4,4,4,5,5,5,6,6,6,7,7,7)
variable<- c(0,0,5,0,0,0,10,0,1,1,0,0,0,0,0,0)
A<- data.frame(day, event, variable)
day event variable
1 1 1 0
2 2 1 0
3 3 2 5
4 4 3 0
5 5 4 0
6 6 4 0
7 7 4 10
8 8 5 0
9 9 5 1
10 10 5 1
11 11 6 0
12 12 6 0
13 13 6 0
14 14 7 0
15 15 7 0
16 16 7 0
And how it should look like
day event variable
1 1 1 0
2 2 1 0
3 3 2 5
4 4 3 NA
5 5 4 0
6 6 4 0
7 7 4 10
8 8 5 NA
9 9 5 NA
10 10 5 1
11 11 6 NA
12 12 6 NA
13 13 6 0
14 14 7 0
15 15 7 0
16 16 7 0
Note: It doesn't matter if event 1 has to be assigned with NAs
I tried to do this with if conditions, but is not working well. Any idea? and thanks in advance!
EDIT: New example data from OP
library(data.table)
event2<- c(1,2,2,3,4,4,4,4,4,5,5)
variable2<- c(140, 0, 69, 569, 28, 0,0,0,100,0,0)
desire_output<- c(140, NA, NA, 569, 28, 0,0,0,100, NA,NA)
A2<- data.frame(event2, variable2, desire_output)
setDT(A2)
A2[,first_days_event:=fifelse(.I==min(.I),1,fifelse(.I==min(.I)+1,2,NA_integer_)),by=.(event2)]
A2[,result:={v <- variable2
for (i in 2:.N) {
if (is.na(first_days_event[i])) {
v[i] <- variable2[i]
} else if (first_days_event[i]==1 & variable2[i]==0){
if (variable2[i-1]>0) {
v[i] <- NA_integer_
if (first_days_event[i+1]==2) {
v[i+1] <- NA_integer_
}
}
}
}
v}]
A2
#> event2 variable2 desire_output first_days_event result
#> 1: 1 140 140 1 140
#> 2: 2 0 NA 1 NA
#> 3: 2 69 NA 2 NA
#> 4: 3 569 569 1 569
#> 5: 4 28 28 1 28
#> 6: 4 0 0 2 0
#> 7: 4 0 0 NA 0
#> 8: 4 0 0 NA 0
#> 9: 4 100 100 NA 100
#> 10: 5 0 NA 1 NA
#> 11: 5 0 NA 2 NA
I will use this simple loop solution. Just need to create a flag indicating the first tow days of each event.
library(data.table)
day <- c(1:16)
event<- c(1,1,2,3,4,4,4,5,5,5,6,6,6,7,7,7)
variable<- c(0,0,5,0,0,0,10,0,1,1,0,0,0,0,0,0)
A<- data.frame(day, event, variable)
setDT(A)
A[,first_days_event:=fifelse(.I==min(.I),1,fifelse(.I==min(.I)+1,2,NA_integer_)),by=.(event)]
A[,result:={v <- numeric(.N)
for (i in 2:.N) {
if (is.na(first_days_event[i])) {
v[i] <- variable[i]
} else if (first_days_event[i]==1){
if (variable[i-1]>0) {
v[i] <- NA_integer_
if (first_days_event[i+1]==2) {
v[i+1] <- NA_integer_
}
} else {
v[i] <- variable[i]
}
}
}
v}]
A
#> day event variable first_days_event result
#> 1: 1 1 0 1 0
#> 2: 2 1 0 2 0
#> 3: 3 2 5 1 5
#> 4: 4 3 0 1 NA
#> 5: 5 4 0 1 0
#> 6: 6 4 0 2 0
#> 7: 7 4 10 NA 10
#> 8: 8 5 0 1 NA
#> 9: 9 5 1 2 NA
#> 10: 10 5 1 NA 1
#> 11: 11 6 0 1 NA
#> 12: 12 6 0 2 NA
#> 13: 13 6 0 NA 0
#> 14: 14 7 0 1 0
#> 15: 15 7 0 2 0
#> 16: 16 7 0 NA 0
Here is a potential tidyverse approach.
You can store the last value of a group in a temporary column last_var and use lag to move to the first row of the following group for comparison.
Note that the default in lag will determine if variable in event 1 is 0 or NA.
The final mutate will evaluate the row if within the first 2 rows of the group, and check last_var to determine if should set to NA or leave alone.
Edit: For the ifelse need to also check if first day's variable for the event is 0.
library(tidyverse)
A %>%
group_by(event) %>%
mutate(last_var = ifelse(row_number() == n(), last(variable), 0)) %>%
ungroup %>%
mutate(last_var = lag(last_var, default = 0)) %>%
group_by(event) %>%
mutate(variable = ifelse(row_number() <= 2 & first(last_var) > 0 & first(variable) == 0, NA, variable)) %>%
select(-last_var)
Output
# A tibble: 16 x 3
# Groups: event [7]
day event variable
<int> <dbl> <dbl>
1 1 1 0
2 2 1 0
3 3 2 5
4 4 3 NA
5 5 4 0
6 6 4 0
7 7 4 10
8 8 5 NA
9 9 5 NA
10 10 5 1
11 11 6 NA
12 12 6 NA
13 13 6 0
14 14 7 0
15 15 7 0
16 16 7 0
With the second data frame included in the comments:
Output
# A tibble: 11 x 3
# Groups: event [5]
event variable desire_output
<dbl> <dbl> <dbl>
1 1 140 140
2 2 NA NA
3 2 NA NA
4 3 569 569
5 4 28 28
6 4 0 0
7 4 0 0
8 4 0 0
9 4 100 100
10 5 NA NA
11 5 NA NA
There have been several discussions about counting consecutive strings of zeroes and ones (or other values) using functions like rle or cumsum. I have played around with these functions, but I can't easily figure out how to get them to apply to my specific problem.
I am working with ecological presence/absence data ("pres.abs" = 1 or 0) organized by time ("year") and location ("id"). For each location id, I would like to separately calculate the length of consecutive ones and zeroes through time. Where these cannot be calculated, I want to return "NA".
Below is a sample of what the data looks like (first 3 columns) and the output I am hoping to achieve (last 2 columns). Ideally, this would be a pretty fast function avoiding for-loops since the real data frame contains ~15,000 rows.
year = rep(1:10, times=3)
id = c(rep(1, times=10), rep(2, times=10), rep(3, times=10))
pres.abs.id.1 = c(0, 0, 0, 1, 1, 1, 0, 0, 1, 1) #Pres/abs data at site 1 across time
pres.abs.id.2 = c(1, 1, 0, 1, 0, 0, 1, 0, 0, 0) #Pres/abs data at site 2 across time
pres.abs.id.3 = c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1) #Pres/abs data at site 3 across time
pres.abs = c(pres.abs.id.1, pres.abs.id.2, pres.abs.id.3)
dat = data.frame(id, year, pres.abs)
dat$cumul.zeroes = c(1,2,3,NA,NA,NA,1,2,NA,NA,NA,NA,1,NA,1,2,NA,1,2,3,1,2,3,4,5,NA,NA,NA,NA,NA)
dat$cumul.ones = c(NA,NA,NA,1,2,3,NA,NA,1,2,1,2,NA,1,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,1,2,3,4,5)
> dat
id year pres.abs cumul.zeroes cumul.ones
1 1 1 0 1 NA
2 1 2 0 2 NA
3 1 3 0 3 NA
4 1 4 1 NA 1
5 1 5 1 NA 2
6 1 6 1 NA 3
7 1 7 0 1 NA
8 1 8 0 2 NA
9 1 9 1 NA 1
10 1 10 1 NA 2
11 2 1 1 NA 1
12 2 2 1 NA 2
13 2 3 0 1 NA
14 2 4 1 NA 1
15 2 5 0 1 NA
16 2 6 0 2 NA
17 2 7 1 NA 1
18 2 8 0 1 NA
19 2 9 0 2 NA
20 2 10 0 3 NA
21 3 1 0 1 NA
22 3 2 0 2 NA
23 3 3 0 3 NA
24 3 4 0 4 NA
25 3 5 0 5 NA
26 3 6 1 NA 1
27 3 7 1 NA 2
28 3 8 1 NA 3
29 3 9 1 NA 4
30 3 10 1 NA 5
Thanks very much for your help.
Here's a base R way using rle and sequence:
dat <- within(dat, {
cumul.counts <- unlist(lapply(split(pres.abs, id), function(x) sequence(rle(x)$lengths)))
cumul.zeroes <- replace(cumul.counts, pres.abs == 1, NA)
cumul.ones <- replace(cumul.counts, pres.abs == 0, NA)
rm(cumul.counts)
})
# id year pres.abs cumul.ones cumul.zeroes
# 1 1 1 0 NA 1
# 2 1 2 0 NA 2
# 3 1 3 0 NA 3
# 4 1 4 1 1 NA
# 5 1 5 1 2 NA
# 6 1 6 1 3 NA
# 7 1 7 0 NA 1
# 8 1 8 0 NA 2
# 9 1 9 1 1 NA
# 10 1 10 1 2 NA
# 11 2 1 1 1 NA
# 12 2 2 1 2 NA
# 13 2 3 0 NA 1
# 14 2 4 1 1 NA
# 15 2 5 0 NA 1
# 16 2 6 0 NA 2
# 17 2 7 1 1 NA
# 18 2 8 0 NA 1
# 19 2 9 0 NA 2
# 20 2 10 0 NA 3
# 21 3 1 0 NA 1
# 22 3 2 0 NA 2
# 23 3 3 0 NA 3
# 24 3 4 0 NA 4
# 25 3 5 0 NA 5
# 26 3 6 1 1 NA
# 27 3 7 1 2 NA
# 28 3 8 1 3 NA
# 29 3 9 1 4 NA
# 30 3 10 1 5 NA
Here's one option with dplyr:
require(dplyr)
dat %>%
group_by(id, x = cumsum(c(0,diff(pres.abs)) != 0)) %>%
mutate(cumul.zeros = ifelse(pres.abs, NA_integer_, row_number()),
cumul.ones = ifelse(!pres.abs, NA_integer_, row_number())) %>%
ungroup() %>% select(-x)
#Source: local data frame [30 x 5]
#
# id year pres.abs cumul.zeros cumul.ones
#1 1 1 0 1 NA
#2 1 2 0 2 NA
#3 1 3 0 3 NA
#4 1 4 1 NA 1
#5 1 5 1 NA 2
#6 1 6 1 NA 3
#7 1 7 0 1 NA
#8 1 8 0 2 NA
#9 1 9 1 NA 1
#10 1 10 1 NA 2
#11 2 1 1 NA 1
#12 2 2 1 NA 2
#13 2 3 0 1 NA
#14 2 4 1 NA 1
#15 2 5 0 1 NA
#16 2 6 0 2 NA
#17 2 7 1 NA 1
#18 2 8 0 1 NA
#19 2 9 0 2 NA
#20 2 10 0 3 NA
#21 3 1 0 1 NA
#22 3 2 0 2 NA
#23 3 3 0 3 NA
#24 3 4 0 4 NA
#25 3 5 0 5 NA
#26 3 6 1 NA 1
#27 3 7 1 NA 2
#28 3 8 1 NA 3
#29 3 9 1 NA 4
#30 3 10 1 NA 5