dplyr how to count cycles in the records - r

For example, if I have records like:
A B
1 2
2 3
3 1
1 2
2 1
Let's say one cycle is from 1 (to 2 to 3) back to 1,so I need my data frame to be like
No. A B
cycle1 1 2
cycle1 2 3
cycle1 3 1
cycle2 1 2
cycle2 2 1
Or a better way for me, I just need to record the time the same record appears, like
Time A B
Time1 1 2
Time1 2 3
Time1 3 1
Time2 1 2
Time1 2 1
I need to do this because I have to use summarize function in dplyr to do calculation but I cannot group data by A and B directly. The order of the data is also important.

Is this what you want ?
library(zoo)
T1=which(df$A==1)
T2=1:length(T1)
T2=paste('cycle',T2 )
df$No=NA
df$No[T1]=T2
df$No=na.locf(df$No)
df
A B No
1 1 2 cycle 1
2 2 3 cycle 1
3 3 1 cycle 1
4 1 2 cycle 2
5 2 1 cycle 2
#the reason: keep the row Id with the calculation
library(dplyr)
df%>%group_by(A,B)%>%mutate(Time=paste('Time',row_number()))
A B Time
<int> <int> <chr>
1 1 2 Time 1
2 2 3 Time 1
3 3 1 Time 1
4 1 2 Time 2
5 2 1 Time 1

Create an augmented 'diff' variable. c(NA , diff (your_var)). Within a sequence group this will be 1. Set your group to change at the logical falsity of that proposition. (My first iteration on the algorithm wasn't quite correct so modified it slightly.)
dat %>% as_tibble() %>% mutate(G = cumsum( c(-1, diff(A)) < 0 ) )
# A tibble: 5 x 3
A B G
<int> <int> <int>
1 1 2 1
2 2 3 1
3 3 1 1
4 1 2 2
5 2 1 2
dat %>% as_tibble() %>% mutate(G = paste0( "time", cumsum( c(-1, diff(A)) < 0 ) ))
# A tibble: 5 x 3
A B G
<int> <int> <chr>
1 1 2 time1
2 2 3 time1
3 3 1 time1
4 1 2 time2
5 2 1 time2
One could also test for A=1, but then sequences like 1,2,3,2,3,4 would not get properly split.

Related

issues using first and mutate with group_by

I am using mutate to create a column depending on the first value of a group
library(tidyverse)
test = data.frame(grp = c(1,1,1,2,2,2), x = c(1,2,3,1,2,3), y = c(1,2,3,1,2,3))
test
grp x y
1 1 1 1
2 1 2 2
3 1 3 3
4 2 1 1
5 2 2 2
6 2 3 3
test %>% group_by(grp) %>%
mutate(y = ifelse(grp[[1]] == x[[1]], y-1, y))
grp x y
<dbl> <dbl> <dbl>
1 1 1 0
2 1 2 0
3 1 3 0
4 2 1 1
5 2 2 1
6 2 3 1
However output is not as I expected.
Expected output is
grp x y
<dbl> <dbl> <dbl>
1 1 1 0
2 1 2 1
3 1 3 2
4 2 1 1
5 2 2 2
6 2 3 3
Can you please explain what is happening and how best to get my expected solution?
You need to remove the index [[1]] from grp since it will only change the first value of that group and use that to replace y. Since grp is the group you should avoid indexing it. Just use it as is, i.e.
library(dplyr)
test %>%
group_by(grp) %>%
mutate(new_y = ifelse(grp == first(x), y-1, y))
# A tibble: 6 × 4
# Groups: grp [2]
grp x y new_y
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 0
2 1 2 2 1
3 1 3 3 2
4 2 1 1 1
5 2 2 2 2
6 2 3 3 3
Because of the x[[1]], you are always comparing the group values of each row with the the x value of the first row. I think you want grp==x within ifelse()

Code values in new column based on whether values in another column are unique

Given the following data I would like to create a new column new_sequence based on the condition:
If only one id is present the new value should be 0. If several id's are present, the new value should numbered according to the values present in sequence.
dat <- tibble(id = c(1,2,3,3,3,4,4),
sequence = c(1,1,1,2,3,1,2))
# A tibble: 7 x 2
id sequence
<dbl> <dbl>
1 1 1
2 2 1
3 3 1
4 3 2
5 3 3
6 4 1
7 4 2
So, for the example data I am looking to produce the following output:
# A tibble: 7 x 3
id sequence new_sequence
<dbl> <dbl> <dbl>
1 1 1 0
2 2 1 0
3 3 1 1
4 3 2 2
5 3 3 3
6 4 1 1
7 4 2 2
I have tried with the code below, that does not work since all unique values are coded as 0
dat %>% mutate(new_sequence = ifelse(!duplicated(id), 0, sequence))
Use dplyr::add_count() rather than !duplicated():
library(dplyr)
dat %>%
add_count(id) %>%
mutate(new_sequence = ifelse(n == 1, 0, sequence)) %>%
select(!n)
Output:
# A tibble: 7 x 3
id sequence new_sequence
<dbl> <dbl> <dbl>
1 1 1 0
2 2 1 0
3 3 1 1
4 3 2 2
5 3 3 3
6 4 1 1
7 4 2 2
You can also try the following. After grouping by id check if the number of rows in the group n() is 1 or not. Use separate if and else instead of ifelse since the lengths are different within each group.
dat %>%
group_by(id) %>%
mutate(new_sequence = if(n() == 1) 0 else sequence)
Output
id sequence new_sequence
<dbl> <dbl> <dbl>
1 1 1 0
2 2 1 0
3 3 1 1
4 3 2 2
5 3 3 3
6 4 1 1
7 4 2 2

Check if a character exists in a specific group and create a new column in R

Thank you in advance for your time reading this post. I have a data.frame that looks like this
time offspring
1 1
1 2
2 1
2 5
3 1
3 4
and I would like to check if the offspring of every time point match the offspring of the last time point. To be more explicit I would like to see if the offspring of the time point 1 and time point 2 are present in the timepoint 3.
When this is the case, then I would like to assign the offspring with the value 1 in a new column and when not with the value 0.4.
For example
time offspring alpha
1 1 1
1 2 0.4
2 1 1
2 5 0.4
3 1 1
3 4 1
Any help and comment are highly appreciated.
One dplyr option could be:
df %>%
group_by(offspring) %>%
mutate(alpha = pmax(0.4, all(1:3 %in% time)))
time offspring alpha
<int> <int> <dbl>
1 1 1 1
2 1 2 0.4
3 2 1 1
4 2 5 0.4
5 3 1 1
6 3 4 0.4
If cases that are only present at time period three should be also treated as ones:
df %>%
group_by(offspring) %>%
mutate(alpha = pmax(0.4, all(1:3 %in% time) | unique(time) == 3))
time offspring alpha
<int> <int> <dbl>
1 1 1 1
2 1 2 0.4
3 2 1 1
4 2 5 0.4
5 3 1 1
6 3 4 1

counting indicator respect of 2 groups

I have a group and persons in each group. and an indicator. How to count indicator per each group for each person element?
group person ind
1 1 1
1 1 1
1 2 1
2 1 0
2 2 1
2 2 1
output
so in the first group 2 persons have 1 in ind, and second group one person so
group person ind. count
1 1 1 2
1 1 1 2
1 2 1 2
2 1 0 1
2 2 1 1
2 2 1 1
Could do:
library(dplyr)
df %>%
group_by(group) %>%
mutate(
count = n_distinct(person[ind == 1])
)
Output:
# A tibble: 6 x 4
# Groups: group [2]
group person ind count
<int> <int> <int> <int>
1 1 1 1 2
2 1 1 1 2
3 1 2 1 2
4 2 1 0 1
5 2 2 1 1
6 2 2 1 1
Or in data.table:
library(data.table)
setDT(df)[, count := uniqueN(person[ind == 1]), by = group]
An option using base R
df1$count <- with(df1, ave(ind* person, group, FUN =
function(x) length(unique(x[x!=0]))))
df1$count
#[1] 2 2 2 1 1 1

Create a combination ID number from a set of factors in R

can anyone help me out in computing a new variable that will number a distinct combination from some factors?
Assuming there are 4 within subject factors (A, B, C, D) with 8 repetitions of each combination for any of 10 subjects, this is how my data could look like to represent it's actual structure:
library(AlgDesign) #for generating a factorial design)
df <-gen.factorial(c(2,2,2,2,8,10), factors = "all",
varNames = c("A", "B", "C", "D", "replication", "Subject"))
> head(df)
A B C D replication Subject
1 1 1 1 1 1 1
2 2 1 1 1 1 1
3 1 2 1 1 1 1
4 2 2 1 1 1 1
5 1 1 2 1 1 1
6 2 1 2 1 1 1
> tail(df)
A B C D replication Subject
1275 1 2 1 2 8 10
1276 2 2 1 2 8 10
1277 1 1 2 2 8 10
1278 2 1 2 2 8 10
1279 1 2 2 2 8 10
1280 2 2 2 2 8 10
In this example replication was simply generated in order to force 8 reps but it doesnt "code" the combintation itself.
My original data has only variables A, B, C, D and Subject and I'd like to compute replication in a way that it has distinct values
but for each combination of A, B, C, D
library(AlgDesign)
library(dplyr)
df <-gen.factorial(c(2,2,2,2,8,10), factors = "all",
varNames = c("A", "B", "C", "D", "replication", "Subject"))
df %>%
rowwise() %>% # for each row
mutate(factors = paste0(c(A,B,C,D), collapse = "_")) %>% # create a combination of your factors
ungroup() %>% # forget the row grouping
mutate(replication_upd = as.numeric(factor(factors))) # create a number based on the combination you have
# # A tibble: 1,280 x 8
# A B C D replication Subject factors replication_upd
# <fct> <fct> <fct> <fct> <fct> <fct> <chr> <dbl>
# 1 1 1 1 1 1 1 1_1_1_1 1
# 2 2 1 1 1 1 1 2_1_1_1 9
# 3 1 2 1 1 1 1 1_2_1_1 5
# 4 2 2 1 1 1 1 2_2_1_1 13
# 5 1 1 2 1 1 1 1_1_2_1 3
# 6 2 1 2 1 1 1 2_1_2_1 11
# 7 1 2 2 1 1 1 1_2_2_1 7
# 8 2 2 2 1 1 1 2_2_2_1 15
# 9 1 1 1 2 1 1 1_1_1_2 2
#10 2 1 1 2 1 1 2_1_1_2 10
# # ... with 1,270 more rows
You can remove any unnecessary variables. I left them there so you can see how the process works.
Another option is this
# create a look up table based on unique combinations and assign them a number
df %>% distinct(A,B,C,D) %>% mutate(replication_upd = row_number()) -> look_up
# join back to original dataset
df %>% inner_join(look_up, by=c("A","B","C","D")) %>% tbl_df()
# # A tibble: 1,280 x 7
# A B C D replication Subject replication_upd
# <fct> <fct> <fct> <fct> <fct> <fct> <int>
# 1 1 1 1 1 1 1 1
# 2 2 1 1 1 1 1 2
# 3 1 2 1 1 1 1 3
# 4 2 2 1 1 1 1 4
# 5 1 1 2 1 1 1 5
# 6 2 1 2 1 1 1 6
# 7 1 2 2 1 1 1 7
# 8 2 2 2 1 1 1 8
# 9 1 1 1 2 1 1 9
# 10 2 1 1 2 1 1 10
# # ... with 1,270 more rows
Note that the first approach picks the numbers based on the new variable we create (i.e. orders A,B,C,D), and the second approach uses the initial order of you dataset to pick the number for each unique combination.

Resources