I am trying to make a function with this data and would really appreciate help with this!
example<- data.frame(Day=c(2,4,8,16,32,44,2,4,8,16,32,44,2,4,8,16,32,44),
Replicate=c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,
1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,
1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3),
Treament=c("CC","CC","CC","CC","CC","CC","CC","CC","CC","CC","CC","CC","CC","CC","CC","CC","CC","CC",
"HP","HP","HP","HP","HP","HP","HP","HP","HP","HP","HP","HP","HP","HP","HP","HP","HP","HP",
"LL","LL","LL","LL","LL","LL","LL","LL","LL","LL","LL","LL","LL","LL","LL","LL","LL","LL"),
AFDM=c(94.669342,94.465752,84.897023,81.435993,86.556221,75.328294,94.262162,88.791240,75.735474,81.232403,
67.050593,76.346244,95.076522,88.968823,83.879073,73.958836,70.645724,67.184695,99.763156,92.022673,
92.245362,74.513934,50.083136,36.979418,94.872932,86.353037,81.843173,67.795465,46.622106,18.323099,
95.089932,93.244212,81.679814,65.352385,18.286525,7.517794,99.559972,86.759404,84.693433,79.196504,
67.456961,54.765706,94.074014,87.543693,82.492548,72.333367,51.304676,51.304676,98.340870,86.322153,
87.950873,84.693433,63.316485,63.723665))
Example:
I want to insert a new row with an AFDM value (e.g., 0.9823666) that was calculated with another function.
This new row must be on each Day 2 (and call it as Day 0), and I want to preserve the name of each Replica and Treatment of each group.
Thus, this new row must be: Day 0, Replicate=same, Treatment=same, AFDM=0.9823666.
This is so I can later run a regression with the data (from 0 to 44, 3 replicates for each Treatment).
I would prefer a solution on dplyr.
Thanks in advance
We can create a grouping column with cumsum, then expand the dataset with complete and fill the other columns
library(dplyr)
library(tidyr)
example %>%
group_by(grp = cumsum(Day == 2)) %>%
complete(Day = c(0, unique(Day)), fill = list(AFDM = 0.9823666)) %>%
fill(Replicate, Treament, .direction = 'updown')
# A tibble: 63 x 5
# Groups: grp [9]
# grp Day Replicate Treament AFDM
# <int> <dbl> <dbl> <chr> <dbl>
# 1 1 0 1 CC 0.982
# 2 1 2 1 CC 94.7
# 3 1 4 1 CC 94.5
# 4 1 8 1 CC 84.9
# 5 1 16 1 CC 81.4
# 6 1 32 1 CC 86.6
# 7 1 44 1 CC 75.3
# 8 2 0 2 CC 0.982
# 9 2 2 2 CC 94.3
#10 2 4 2 CC 88.8
# … with 53 more rows
You can use distinct to get unique Replicate and Treament, add Day and AFDM column with the default values and bind the rows to the original dataframe.
library(dplyr)
example %>%
distinct(Replicate, Treament) %>%
mutate(Day = 0, AFDM = 0.9823666) %>%
bind_rows(example) %>%
arrange(Replicate, Treament)
# Replicate Treament Day AFDM
#1 1 CC 0 0.9823666
#2 1 CC 2 94.6693420
#3 1 CC 4 94.4657520
#4 1 CC 8 84.8970230
#5 1 CC 16 81.4359930
#6 1 CC 32 86.5562210
#7 1 CC 44 75.3282940
#8 1 HP 0 0.9823666
#9 1 HP 2 99.7631560
#10 1 HP 4 92.0226730
#.....
Say I have:
df<-data.frame(group=c(1, 1, 1, 1, 2, 2, 2, 2),
date=c("2000-01-01", "2000-01-02", "2000-01-04", "2000-01-05", "2000-01-09", "2000-01-10", "2000-01-11", "2000-01-13"),
want_group=c(1, 1, 2, 2, 3,3,3,4))
I want to create a want_group variable that groups by date, group, and whether they were "daily". So for example I want to create unique id's for within group 1 for the 1st and 2nd, and then a new unique id for the 4th and 5th, and then similarly for group 2 for the 9th, 10th, and 11th.
group date want_group
1 1 2000-01-01 1
2 1 2000-01-02 1
3 1 2000-01-04 2
4 1 2000-01-05 2
5 2 2000-01-09 3
6 2 2000-01-10 3
7 2 2000-01-11 3
8 2 2000-01-13 4
Thanks,
We can use diff and cumsum to calculate the runs. This increments everytime the difference in date is more than 1.
df$new <- cumsum(c(TRUE, diff(as.Date(df$date)) > 1))
df
# group date want_group new
#1 1 2000-01-01 1 1
#2 1 2000-01-02 1 1
#3 1 2000-01-04 2 2
#4 1 2000-01-05 2 2
#5 2 2000-01-09 3 3
#6 2 2000-01-10 3 3
#7 2 2000-01-11 3 3
#8 2 2000-01-13 4 4
We add TRUE in the beginning since diff returns output of length 1 less than the original vector.
To handle this by group we can do
library(dplyr)
df %>%
mutate(date = as.Date(date)) %>%
group_by(group) %>%
mutate(new = c(TRUE, diff(date) > 1)) %>%
ungroup() %>%
mutate(new = cumsum(new))
With base R, we can also do
df$date <- as.Date(df$date)
df$new <- with(df, cumsum(c(TRUE, date[-1]- date[-length(date)] > 1)))
df$new
#[1] 1 1 2 2 3 3 3 4
Or use difference with lag in dplyr
library(dplyr)
df %>%
mutate(date = as.Date(date),
want_group = cumsum(date - lag(date, default = first(date)) > 1))
For a given ID, I am trying to identify the latest observation (last wave or highest wave number) that meets a criteria (=1 or =2)
My data:
data <- data.frame(id=c(1,1,1, 2,2,2, 3,3,3), wave=c(1,2,3, 1,2,3, 1,2,3), var=c(NA,1,2, 1,2,NA, 3,1,3))
Outcome:
outcome <- data.frame(id=c(1,1,1, 2,2,2, 3,3,3), wave=c(1,2,3, 1,2,3, 1,2,3), var=c(NA,1,2, 1,2,NA, 3,1,3), flag=c(0,0,1, 0,1,0, 0,1,0))
I can't seem to figure out how to specify to only flag the latest/last row for a given id
data %>% group_by(id) %>% mutate(flag=if_else(var %in% c(1,2) & ...,1,0))
Subset the 'wave', get the max, compare (==) with the 'wave' column and convert to integer
library(dplyr)
data %>%
group_by(id) %>%
mutate(flag = as.integer(wave == max(wave[var %in% 1:2])))
# A tibble: 9 x 4
# Groups: id [3]
# id wave var flag
# <dbl> <dbl> <dbl> <int>
#1 1 1 NA 0
#2 1 2 1 0
#3 1 3 2 1
#4 2 1 1 0
#5 2 2 2 1
#6 2 3 NA 0
#7 3 1 3 0
#8 3 2 1 1
#9 3 3 3 0
Here, we assume that there are unique 'wave' values for each 'id'
I have a data with about 1000 groups each group is ordered from 1-100(can be any number within 100).
As I was looking through the data. I found that some groups had bad orders, i.e., it would order to 100 then suddenly a 24 would show up.
How can I delete all of these error data
As you can see from the picture above(before -> after), I would like to find all rows that don't follow the order within the group and just delete it.
Any help would be great!
lag will compute the difference between the current value and the previous value, diff will be used to select only positive difference i.e. the current value is greater than the previous value. min is used as lag give the first value NA. I keep the helper column diff to check, but you can deselect using %>% select(-diff)
library(dplyr)
df1 %>% group_by(gruop) %>% mutate(diff = order-lag(order)) %>%
filter(diff >= 0 | order==min(order))
# A tibble: 8 x 3
# Groups: gruop [2]
gruop order diff
<int> <int> <int>
1 1 1 NA
2 1 3 2
3 1 5 2
4 1 10 5
5 2 1 NA
6 2 4 3
7 2 4 0
8 2 8 4
Data
df1 <- read.table(text="
gruop order
1 1
1 3
1 5
1 10
1 2
2 1
2 4
2 4
2 8
2 3
",header=T, stringsAsFactors = F)
Assuming the order column increments by 1 every time we can use ave where we remove those rows which do not have difference of 1 with the previous row by group.
df[!ave(df$order, df$group, FUN = function(x) c(1, diff(x))) != 1, ]
# group order
#1 1 1
#2 1 2
#3 1 3
#4 1 4
#6 2 1
#7 2 2
#8 2 3
#9 2 4
EDIT
For the updated example, we can just change the comparison
df[ave(df$order, df$group, FUN = function(x) c(1, diff(x))) >= 0, ]
Playing with data.table:
library(data.table)
setDT(df1)[, diffo := c(1, diff(order)), group][diffo == 1, .(group, order)]
group order
1: 1 1
2: 1 2
3: 1 3
4: 1 4
5: 2 1
6: 2 2
7: 2 3
8: 2 4
Where df1 is:
df1 <- data.frame(
group = rep(1:2, each = 5),
order = c(1:4, 2, 1:4, 3)
)
EDIT
If you only need increasing order, and not steps of one then you can do:
df3 <- transform(df1, order = c(1,3,5,10,2,1,4,7,9,3))
setDT(df3)[, diffo := c(1, diff(order)), group][diffo >= 1, .(group, order)]
group order
1: 1 1
2: 1 3
3: 1 5
4: 1 10
5: 2 1
6: 2 4
7: 2 7
8: 2 9