How to update a value in a specific column in R - r

Here is a part of the sample data :
dat<-read.table (text=" ID Time B1 T1 Q1 W1 M1
1 12 12 0 12 11 9
1 13 0 1 NA NA NA
2 10 12 0 6 7 8
2 14 0 1 NA NA NA
1 16 16A 0 1 2 4
1 14 0 1 NA NA NA
2 14 16A 0 5 6 7
2 7 0 1 NA NA NA
1 7 20 0 5 8 0
1 7 0 1 NA NA NA
2 9 20 0 7 8 1
2 9 0 1 NA NA NA
", header=TRUE)
I want to update value 1 In column T1 for repeated IDs. For the first repeated IDs, should be a value of 1, and for the second repeated IDs, a value of 2; and for the third repeated IDs, a value of 3 and so on. I also want to replace NA with blank cells. here is the expected outcome:
ID Time B1 T1 Q1 W1 M1
1 12 12 0 12 11 9
1 13 0 1
2 10 12 0 6 7 8
2 14 0 1
1 16 16A 0 1 2 4
1 14 0 2
2 14 16A 0 5 6 7
2 7 0 2
1 7 20 0 5 8 0
1 7 0 3
2 9 20 0 7 8 1
2 9 0 3

You could use an ifelse across with cumsum per group like this:
library(dplyr)
dat %>%
group_by(ID, B1) %>%
mutate(across(T1, ~ ifelse(.x == 1, cumsum(.x), T1)))
#> # A tibble: 12 × 7
#> # Groups: ID, B1 [8]
#> ID Time B1 T1 Q1 W1 M1
#> <int> <int> <chr> <int> <int> <int> <int>
#> 1 1 12 12 0 12 11 9
#> 2 1 13 0 1 NA NA NA
#> 3 2 10 12 0 6 7 8
#> 4 2 14 0 1 NA NA NA
#> 5 1 16 16A 0 1 2 4
#> 6 1 14 0 2 NA NA NA
#> 7 2 14 16A 0 5 6 7
#> 8 2 7 0 2 NA NA NA
#> 9 1 7 20 0 5 8 0
#> 10 1 7 0 3 NA NA NA
#> 11 2 9 20 0 7 8 1
#> 12 2 9 0 3 NA NA NA
Created on 2023-01-14 with reprex v2.0.2

With data.table
library(data.table)
setDT(dat)[T1 ==1, T1 := cumsum(T1), .(ID, B1)]
-output
> dat
ID Time B1 T1 Q1 W1 M1
1: 1 12 12 0 12 11 9
2: 1 13 0 1 NA NA NA
3: 2 10 12 0 6 7 8
4: 2 14 0 1 NA NA NA
5: 1 16 16A 0 1 2 4
6: 1 14 0 2 NA NA NA
7: 2 14 16A 0 5 6 7
8: 2 7 0 2 NA NA NA
9: 1 7 20 0 5 8 0
10: 1 7 0 3 NA NA NA
11: 2 9 20 0 7 8 1
12: 2 9 0 3 NA NA NA

Related

How I can insert values a dataframe to another dataframe

A similar to my data is:
dat1<-read.table (text=" ID Rat Garden Class Time1 Time2 Time3
1 12 12 0 15 16 20
1 13 0 1 NA NA NA
2 13 11 0 18 12 16
2 9 0 1 NA NA NA
1 6 13 0 17 14 14
1 7 0 2 NA NA NA
2 4 14 0 17 16 12
2 3 0 2 NA NA NA
", header=TRUE)
dat2<-read.table (text=" ID Value1 Value2
1 6 7
2 5 4
", header=TRUE)
I want to insert the values of dat2 to dat1 in the Time1 column. In front of numbers 1 and 2 in the class column.
I get the following outcome.
ID Rat Garden Class Time1 Time2 Time3
1 12 12 0 15 16 20
1 13 0 1 6
2 13 11 0 18 12 16
2 9 0 1 5
1 6 13 0 17 14 14
1 7 0 2 7
2 4 14 0 17 16 12
2 3 0 2 4
We may group by 'ID', and replace the 'Time1' where the NA values occur with the unlisted 'dat2' 'Value' columns where the ID matches
library(dplyr)
dat1 %>%
group_by(ID) %>%
mutate(Time1 = replace(Time1, is.na(Time1),
unlist(dat2[-1][dat2$ID == cur_group()$ID,]))) %>%
ungroup
-output
# A tibble: 8 × 7
ID Rat Garden Class Time1 Time2 Time3
<int> <int> <int> <int> <int> <int> <int>
1 1 12 12 0 15 16 20
2 1 13 0 1 6 NA NA
3 2 13 11 0 18 12 16
4 2 9 0 1 5 NA NA
5 1 6 13 0 17 14 14
6 1 7 0 2 7 NA NA
7 2 4 14 0 17 16 12
8 2 3 0 2 4 NA NA
Here is a wild ride:
First we pull the values as a vector from dat2.
Then we put alternating an NA into the vector until it gets column length of dat1 and
finally we use coalesce after cbind:
library(dplyr)
library(tidyr)
vector <- dat2 %>%
pivot_longer(-ID) %>%
arrange(name) %>%
pull(value)
col_x <- c(sapply(vector, c, rep(NA, 1)))
cbind(dat1, col_x) %>%
mutate(col_x = lag(col_x)) %>%
mutate(Time1= coalesce(Time1, col_x), .keep="unused")
ID Rat Garden Class Time1 Time2 Time3
1 1 12 12 0 15 16 20
2 1 13 0 1 6 NA NA
3 2 13 11 0 18 12 16
4 2 9 0 1 5 NA NA
5 1 6 13 0 17 14 14
6 1 7 0 2 7 NA NA
7 2 4 14 0 17 16 12
8 2 3 0 2 4 NA NA

Filter to remove all rows before a particular value in a specific column, while this particular value occurs several time

I would like to filter to remove all rows before a particular value in a specific column. For example, in the data frame below, I would like to remove all rows before "1" that appears in column x, for as much as "1" occurs. Please note that the value of "1" repeats many times and I want to remove the "NA" rows before the "1" in column x, regarding column a.
Thanks
a b x
1 1 NA
1 2 NA
1 3 1
1 4 0
1 5 0
1 6 NA
1 7 NA
2 1 NA
2 2 NA
2 3 1
2 4 NA
2 5 0
2 6 0
2 7 NA
3 1 NA
3 2 NA
3 3 NA
3 4 NA
3 5 1
3 6 0
3 7 NA
the desired output would be like this:
a b x
1 3 1
1 4 0
1 5 0
1 6 NA
1 7 NA
2 3 1
2 4 NA
2 5 0
2 6 0
2 7 NA
3 5 1
3 6 0
3 7 NA
Does this solve your problem?
library(tidyverse)
dat <- read.table(text = "a b x
1 1 NA
1 2 NA
1 3 1
1 4 0
1 5 0
1 6 NA
1 7 NA
2 1 NA
2 2 NA
2 3 1
2 4 NA
2 5 0
2 6 0
2 7 NA
3 1 NA
3 2 NA
3 3 NA
3 4 NA
3 5 1
3 6 0
3 7 NA", header = TRUE)
dat %>%
group_by(a) %>%
filter(cummax(!is.na(x)) == 1)
#> # A tibble: 13 × 3
#> # Groups: a [3]
#> a b x
#> <int> <int> <int>
#> 1 1 3 1
#> 2 1 4 0
#> 3 1 5 0
#> 4 1 6 NA
#> 5 1 7 NA
#> 6 2 3 1
#> 7 2 4 NA
#> 8 2 5 0
#> 9 2 6 0
#> 10 2 7 NA
#> 11 3 5 1
#> 12 3 6 0
#> 13 3 7 NA
Created on 2021-12-07 by the reprex package (v2.0.1)

R apply function to groups within data frame adding result as additional column

Here is the code for my example dataset.
df = data.frame("group" =c(rep(1,5),rep(1,6),rep(2,4),rep(2,3)), "time" = c(rep(NA,5),seq(1,6),rep(NA,4),seq(1,3)), "p" = seq(1,18) )
group time p
1 1 NA 1
2 1 NA 2
3 1 NA 3
4 1 NA 4
5 1 NA 5
6 1 1 6
7 1 2 7
8 1 3 8
9 1 4 9
10 1 5 10
11 1 6 11
12 2 NA 12
13 2 NA 13
14 2 NA 14
15 2 NA 15
16 2 1 16
17 2 2 17
18 2 3 18
I would like to figure out how to apply a function by group to only the values that have time then append the result as a new column in the data frame. Here is my example function I would like to apply.
pfunc <- function(p){
p+5
}
The output I am hoping to obtain would look as follows.
group time p new_p
1 1 NA 1 NA
2 1 NA 2 NA
3 1 NA 3 NA
4 1 NA 4 NA
5 1 NA 5 NA
6 1 1 6 11
7 1 2 7 12
8 1 3 8 13
9 1 4 9 14
10 1 5 10 15
11 1 6 11 16
12 2 NA 12 NA
13 2 NA 13 NA
14 2 NA 14 NA
15 2 NA 15 NA
16 2 1 16 21
17 2 2 17 22
18 2 3 18 23
You can try this:
library(dplyr)
df %>% group_by(group) %>%
mutate(pnew=ifelse(is.na(time),time,time+5))
# A tibble: 18 x 4
# Groups: group [2]
group time p pnew
<dbl> <int> <int> <dbl>
1 1 NA 1 NA
2 1 NA 2 NA
3 1 NA 3 NA
4 1 NA 4 NA
5 1 NA 5 NA
6 1 1 6 6
7 1 2 7 7
8 1 3 8 8
9 1 4 9 9
10 1 5 10 10
11 1 6 11 11
12 2 NA 12 NA
13 2 NA 13 NA
14 2 NA 14 NA
15 2 NA 15 NA
16 2 1 16 6
17 2 2 17 7
18 2 3 18 8
Update
You can use this function:
increase <- function(data,n)
{
data %>% group_by(group) %>%
mutate(pnew=ifelse(is.na(time),time,time+n)) -> result
return(result)
}
increase(df,n = 10)
# A tibble: 18 x 4
# Groups: group [2]
group time p pnew
<dbl> <int> <int> <dbl>
1 1 NA 1 NA
2 1 NA 2 NA
3 1 NA 3 NA
4 1 NA 4 NA
5 1 NA 5 NA
6 1 1 6 11
7 1 2 7 12
8 1 3 8 13
9 1 4 9 14
10 1 5 10 15
11 1 6 11 16
12 2 NA 12 NA
13 2 NA 13 NA
14 2 NA 14 NA
15 2 NA 15 NA
16 2 1 16 11
17 2 2 17 12
18 2 3 18 13
Update 2
I hope this helps:
df %>% group_by(group) %>% rowwise() %>% mutate(pnew=ifelse(is.na(time),NA,pfunc(time)))
# A tibble: 18 x 4
# Rowwise: group
group time p pnew
<dbl> <int> <int> <dbl>
1 1 NA 1 NA
2 1 NA 2 NA
3 1 NA 3 NA
4 1 NA 4 NA
5 1 NA 5 NA
6 1 1 6 6
7 1 2 7 7
8 1 3 8 8
9 1 4 9 9
10 1 5 10 10
11 1 6 11 11
12 2 NA 12 NA
13 2 NA 13 NA
14 2 NA 14 NA
15 2 NA 15 NA
16 2 1 16 6
17 2 2 17 7
18 2 3 18 8

Assigning NAs to rows with conditional statement in r

I'm trying to assign NAs to the first two rows of each event, with the following conditional statement:
If the first day of each event has a value of "variable" = 0, check the day before. If the day before (last day of previous event) has a "variable" > 0, then assign NAs to the first two rows of the event having "variable" = 0 on the first day. If the day before has a "variable" = 0, do nothing.
Here is an example:
day <- c(1:16)
event<- c(1,1,2,3,4,4,4,5,5,5,6,6,6,7,7,7)
variable<- c(0,0,5,0,0,0,10,0,1,1,0,0,0,0,0,0)
A<- data.frame(day, event, variable)
day event variable
1 1 1 0
2 2 1 0
3 3 2 5
4 4 3 0
5 5 4 0
6 6 4 0
7 7 4 10
8 8 5 0
9 9 5 1
10 10 5 1
11 11 6 0
12 12 6 0
13 13 6 0
14 14 7 0
15 15 7 0
16 16 7 0
And how it should look like
day event variable
1 1 1 0
2 2 1 0
3 3 2 5
4 4 3 NA
5 5 4 0
6 6 4 0
7 7 4 10
8 8 5 NA
9 9 5 NA
10 10 5 1
11 11 6 NA
12 12 6 NA
13 13 6 0
14 14 7 0
15 15 7 0
16 16 7 0
Note: It doesn't matter if event 1 has to be assigned with NAs
I tried to do this with if conditions, but is not working well. Any idea? and thanks in advance!
EDIT: New example data from OP
library(data.table)
event2<- c(1,2,2,3,4,4,4,4,4,5,5)
variable2<- c(140, 0, 69, 569, 28, 0,0,0,100,0,0)
desire_output<- c(140, NA, NA, 569, 28, 0,0,0,100, NA,NA)
A2<- data.frame(event2, variable2, desire_output)
setDT(A2)
A2[,first_days_event:=fifelse(.I==min(.I),1,fifelse(.I==min(.I)+1,2,NA_integer_)),by=.(event2)]
A2[,result:={v <- variable2
for (i in 2:.N) {
if (is.na(first_days_event[i])) {
v[i] <- variable2[i]
} else if (first_days_event[i]==1 & variable2[i]==0){
if (variable2[i-1]>0) {
v[i] <- NA_integer_
if (first_days_event[i+1]==2) {
v[i+1] <- NA_integer_
}
}
}
}
v}]
A2
#> event2 variable2 desire_output first_days_event result
#> 1: 1 140 140 1 140
#> 2: 2 0 NA 1 NA
#> 3: 2 69 NA 2 NA
#> 4: 3 569 569 1 569
#> 5: 4 28 28 1 28
#> 6: 4 0 0 2 0
#> 7: 4 0 0 NA 0
#> 8: 4 0 0 NA 0
#> 9: 4 100 100 NA 100
#> 10: 5 0 NA 1 NA
#> 11: 5 0 NA 2 NA
I will use this simple loop solution. Just need to create a flag indicating the first tow days of each event.
library(data.table)
day <- c(1:16)
event<- c(1,1,2,3,4,4,4,5,5,5,6,6,6,7,7,7)
variable<- c(0,0,5,0,0,0,10,0,1,1,0,0,0,0,0,0)
A<- data.frame(day, event, variable)
setDT(A)
A[,first_days_event:=fifelse(.I==min(.I),1,fifelse(.I==min(.I)+1,2,NA_integer_)),by=.(event)]
A[,result:={v <- numeric(.N)
for (i in 2:.N) {
if (is.na(first_days_event[i])) {
v[i] <- variable[i]
} else if (first_days_event[i]==1){
if (variable[i-1]>0) {
v[i] <- NA_integer_
if (first_days_event[i+1]==2) {
v[i+1] <- NA_integer_
}
} else {
v[i] <- variable[i]
}
}
}
v}]
A
#> day event variable first_days_event result
#> 1: 1 1 0 1 0
#> 2: 2 1 0 2 0
#> 3: 3 2 5 1 5
#> 4: 4 3 0 1 NA
#> 5: 5 4 0 1 0
#> 6: 6 4 0 2 0
#> 7: 7 4 10 NA 10
#> 8: 8 5 0 1 NA
#> 9: 9 5 1 2 NA
#> 10: 10 5 1 NA 1
#> 11: 11 6 0 1 NA
#> 12: 12 6 0 2 NA
#> 13: 13 6 0 NA 0
#> 14: 14 7 0 1 0
#> 15: 15 7 0 2 0
#> 16: 16 7 0 NA 0
Here is a potential tidyverse approach.
You can store the last value of a group in a temporary column last_var and use lag to move to the first row of the following group for comparison.
Note that the default in lag will determine if variable in event 1 is 0 or NA.
The final mutate will evaluate the row if within the first 2 rows of the group, and check last_var to determine if should set to NA or leave alone.
Edit: For the ifelse need to also check if first day's variable for the event is 0.
library(tidyverse)
A %>%
group_by(event) %>%
mutate(last_var = ifelse(row_number() == n(), last(variable), 0)) %>%
ungroup %>%
mutate(last_var = lag(last_var, default = 0)) %>%
group_by(event) %>%
mutate(variable = ifelse(row_number() <= 2 & first(last_var) > 0 & first(variable) == 0, NA, variable)) %>%
select(-last_var)
Output
# A tibble: 16 x 3
# Groups: event [7]
day event variable
<int> <dbl> <dbl>
1 1 1 0
2 2 1 0
3 3 2 5
4 4 3 NA
5 5 4 0
6 6 4 0
7 7 4 10
8 8 5 NA
9 9 5 NA
10 10 5 1
11 11 6 NA
12 12 6 NA
13 13 6 0
14 14 7 0
15 15 7 0
16 16 7 0
With the second data frame included in the comments:
Output
# A tibble: 11 x 3
# Groups: event [5]
event variable desire_output
<dbl> <dbl> <dbl>
1 1 140 140
2 2 NA NA
3 2 NA NA
4 3 569 569
5 4 28 28
6 4 0 0
7 4 0 0
8 4 0 0
9 4 100 100
10 5 NA NA
11 5 NA NA

Count consecutive strings of zeroes and ones over multiple groups

There have been several discussions about counting consecutive strings of zeroes and ones (or other values) using functions like rle or cumsum. I have played around with these functions, but I can't easily figure out how to get them to apply to my specific problem.
I am working with ecological presence/absence data ("pres.abs" = 1 or 0) organized by time ("year") and location ("id"). For each location id, I would like to separately calculate the length of consecutive ones and zeroes through time. Where these cannot be calculated, I want to return "NA".
Below is a sample of what the data looks like (first 3 columns) and the output I am hoping to achieve (last 2 columns). Ideally, this would be a pretty fast function avoiding for-loops since the real data frame contains ~15,000 rows.
year = rep(1:10, times=3)
id = c(rep(1, times=10), rep(2, times=10), rep(3, times=10))
pres.abs.id.1 = c(0, 0, 0, 1, 1, 1, 0, 0, 1, 1) #Pres/abs data at site 1 across time
pres.abs.id.2 = c(1, 1, 0, 1, 0, 0, 1, 0, 0, 0) #Pres/abs data at site 2 across time
pres.abs.id.3 = c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1) #Pres/abs data at site 3 across time
pres.abs = c(pres.abs.id.1, pres.abs.id.2, pres.abs.id.3)
dat = data.frame(id, year, pres.abs)
dat$cumul.zeroes = c(1,2,3,NA,NA,NA,1,2,NA,NA,NA,NA,1,NA,1,2,NA,1,2,3,1,2,3,4,5,NA,NA,NA,NA,NA)
dat$cumul.ones = c(NA,NA,NA,1,2,3,NA,NA,1,2,1,2,NA,1,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,1,2,3,4,5)
> dat
id year pres.abs cumul.zeroes cumul.ones
1 1 1 0 1 NA
2 1 2 0 2 NA
3 1 3 0 3 NA
4 1 4 1 NA 1
5 1 5 1 NA 2
6 1 6 1 NA 3
7 1 7 0 1 NA
8 1 8 0 2 NA
9 1 9 1 NA 1
10 1 10 1 NA 2
11 2 1 1 NA 1
12 2 2 1 NA 2
13 2 3 0 1 NA
14 2 4 1 NA 1
15 2 5 0 1 NA
16 2 6 0 2 NA
17 2 7 1 NA 1
18 2 8 0 1 NA
19 2 9 0 2 NA
20 2 10 0 3 NA
21 3 1 0 1 NA
22 3 2 0 2 NA
23 3 3 0 3 NA
24 3 4 0 4 NA
25 3 5 0 5 NA
26 3 6 1 NA 1
27 3 7 1 NA 2
28 3 8 1 NA 3
29 3 9 1 NA 4
30 3 10 1 NA 5
Thanks very much for your help.
Here's a base R way using rle and sequence:
dat <- within(dat, {
cumul.counts <- unlist(lapply(split(pres.abs, id), function(x) sequence(rle(x)$lengths)))
cumul.zeroes <- replace(cumul.counts, pres.abs == 1, NA)
cumul.ones <- replace(cumul.counts, pres.abs == 0, NA)
rm(cumul.counts)
})
# id year pres.abs cumul.ones cumul.zeroes
# 1 1 1 0 NA 1
# 2 1 2 0 NA 2
# 3 1 3 0 NA 3
# 4 1 4 1 1 NA
# 5 1 5 1 2 NA
# 6 1 6 1 3 NA
# 7 1 7 0 NA 1
# 8 1 8 0 NA 2
# 9 1 9 1 1 NA
# 10 1 10 1 2 NA
# 11 2 1 1 1 NA
# 12 2 2 1 2 NA
# 13 2 3 0 NA 1
# 14 2 4 1 1 NA
# 15 2 5 0 NA 1
# 16 2 6 0 NA 2
# 17 2 7 1 1 NA
# 18 2 8 0 NA 1
# 19 2 9 0 NA 2
# 20 2 10 0 NA 3
# 21 3 1 0 NA 1
# 22 3 2 0 NA 2
# 23 3 3 0 NA 3
# 24 3 4 0 NA 4
# 25 3 5 0 NA 5
# 26 3 6 1 1 NA
# 27 3 7 1 2 NA
# 28 3 8 1 3 NA
# 29 3 9 1 4 NA
# 30 3 10 1 5 NA
Here's one option with dplyr:
require(dplyr)
dat %>%
group_by(id, x = cumsum(c(0,diff(pres.abs)) != 0)) %>%
mutate(cumul.zeros = ifelse(pres.abs, NA_integer_, row_number()),
cumul.ones = ifelse(!pres.abs, NA_integer_, row_number())) %>%
ungroup() %>% select(-x)
#Source: local data frame [30 x 5]
#
# id year pres.abs cumul.zeros cumul.ones
#1 1 1 0 1 NA
#2 1 2 0 2 NA
#3 1 3 0 3 NA
#4 1 4 1 NA 1
#5 1 5 1 NA 2
#6 1 6 1 NA 3
#7 1 7 0 1 NA
#8 1 8 0 2 NA
#9 1 9 1 NA 1
#10 1 10 1 NA 2
#11 2 1 1 NA 1
#12 2 2 1 NA 2
#13 2 3 0 1 NA
#14 2 4 1 NA 1
#15 2 5 0 1 NA
#16 2 6 0 2 NA
#17 2 7 1 NA 1
#18 2 8 0 1 NA
#19 2 9 0 2 NA
#20 2 10 0 3 NA
#21 3 1 0 1 NA
#22 3 2 0 2 NA
#23 3 3 0 3 NA
#24 3 4 0 4 NA
#25 3 5 0 5 NA
#26 3 6 1 NA 1
#27 3 7 1 NA 2
#28 3 8 1 NA 3
#29 3 9 1 NA 4
#30 3 10 1 NA 5

Resources