Could you help me solve the problem below: as you can see in the second part of the code I exclude the DR that have all columns that are equal to 0. However, in the third part of the code, I need to select D1 until the last column DR, for the sum to be done. But it gives an error, could you help me solve the problem?
library(dplyr)
df1 <- structure(
list(date1 = c("2021-06-28","2021-06-28","2021-06-28","2021-06-28","2021-06-28",
"2021-06-28","2021-06-28","2021-06-28","2021-06-28","2021-06-28"),
date2 = c("2021-04-02","2021-04-02","2021-04-03","2021-04-08","2021-04-09","2021-04-10","2021-07-01","2021-07-02","2021-07-03","2021-07-03"),
Week= c("Friday","Friday","Saturday","Thursday","Friday","Saturday","Thursday","Friday","Saturday","Saturday"),
D1 = c(2,3,4,4,6,3,4,5,6,2), DR01 = c(4,1,4,3,3,4,3,6,3,2), DR02= c(4,2,6,7,3,2,7,4,4,3),DR03= c(9,5,4,3,3,2,1,5,4,3),
DR04 = c(5,4,3,3,3,6,2,1,9,2),DR05 = c(5,4,5,3,6,2,1,9,3,4),
DR06 = c(2,4,4,3,3,5,6,7,8,3),DR07 = c(2,5,4,4,9,4,7,8,3,3),
DR08 = c(0,0,0,0,1,2,0,0,0,0),DR09 = c(0,0,0,0,0,0,0,0,0,0),DR010 = c(0,0,0,0,0,0,0,0,0,0),DR011 = c(0,4,0,0,0,0,0,0,0,0), DR012 = c(0,0,0,0,0,0,0,0,0,0)),
class = "data.frame", row.names = c(NA, -10L))
df1<-df1 %>%
select(!where(~ is.numeric(.) && all(. == 0)))
df1<-df1 %>%
group_by(date1,date2, Week) %>%
select(D1:DR012) %>%
summarise_all(sum)
We can have the select before
library(dplyr)
df1 %>%
select(date1, date2, Week, matches("^D")) %>%
group_by(date1, date2, Week) %>%
summarise(across(everything(), sum), .groups = 'drop')
-output
# A tibble: 8 × 13
date1 date2 Week D1 DR01 DR02 DR03 DR04 DR05 DR06 DR07 DR08 DR011
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2021-06-28 2021-04-02 Friday 5 5 6 14 9 9 6 7 0 4
2 2021-06-28 2021-04-03 Saturday 4 4 6 4 3 5 4 4 0 0
3 2021-06-28 2021-04-08 Thursday 4 3 7 3 3 3 3 4 0 0
4 2021-06-28 2021-04-09 Friday 6 3 3 3 3 6 3 9 1 0
5 2021-06-28 2021-04-10 Saturday 3 4 2 2 6 2 5 4 2 0
6 2021-06-28 2021-07-01 Thursday 4 3 7 1 2 1 6 7 0 0
7 2021-06-28 2021-07-02 Friday 5 6 4 5 1 9 7 8 0 0
8 2021-06-28 2021-07-03 Saturday 8 5 7 7 11 7 11 6 0 0
After we did the select, it is not clear why we have to select again. It is not really needed as summarise with across can be everything() other than the grouping columns
df1 %>%
select(!where(~ is.numeric(.) && all(. == 0))) %>%
group_by(across(date1:Week)) %>%
summarise(across(everything(), sum), .groups = 'drop')
# A tibble: 8 × 13
date1 date2 Week D1 DR01 DR02 DR03 DR04 DR05 DR06 DR07 DR08 DR011
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2021-06-28 2021-04-02 Friday 5 5 6 14 9 9 6 7 0 4
2 2021-06-28 2021-04-03 Saturday 4 4 6 4 3 5 4 4 0 0
3 2021-06-28 2021-04-08 Thursday 4 3 7 3 3 3 3 4 0 0
4 2021-06-28 2021-04-09 Friday 6 3 3 3 3 6 3 9 1 0
5 2021-06-28 2021-04-10 Saturday 3 4 2 2 6 2 5 4 2 0
6 2021-06-28 2021-07-01 Thursday 4 3 7 1 2 1 6 7 0 0
7 2021-06-28 2021-07-02 Friday 5 6 4 5 1 9 7 8 0 0
8 2021-06-28 2021-07-03 Saturday 8 5 7 7 11 7 11 6 0 0
We could use summarise with across:
library(dplyr)
df1 %>%
select(!where(~ is.numeric(.) && all(. == 0))) %>%
group_by(date1,date2, Week) %>%
summarise(across(where(is.numeric), sum))
date1 date2 Week D1 DR01 DR02 DR03 DR04 DR05 DR06 DR07 DR08 DR011
1 2021-06-28 2021-04-02 Friday 2 4 4 9 5 5 2 2 0 0
2 2021-06-28 2021-04-02 Friday 3 1 2 5 4 4 4 5 0 4
3 2021-06-28 2021-04-03 Saturday 4 4 6 4 3 5 4 4 0 0
4 2021-06-28 2021-04-08 Thursday 4 3 7 3 3 3 3 4 0 0
5 2021-06-28 2021-04-09 Friday 6 3 3 3 3 6 3 9 1 0
6 2021-06-28 2021-04-10 Saturday 3 4 2 2 6 2 5 4 2 0
7 2021-06-28 2021-07-01 Thursday 4 3 7 1 2 1 6 7 0 0
8 2021-06-28 2021-07-02 Friday 5 6 4 5 1 9 7 8 0 0
9 2021-06-28 2021-07-03 Saturday 6 3 4 4 9 3 8 3 0 0
10 2021-06-28 2021-07-03 Saturday 2 2 3 3 2 4 3 3 0 0
DR012 is filtered, so it does not exist anymore to select:
df1 %>%
select(!where(~ is.numeric(.) && all(. == 0))) %>%
names()
[1] "date1" "date2" "Week" "D1" "DR01" "DR02" "DR03" "DR04" "DR05"
[10] "DR06" "DR07" "DR08" "DR011"
Change your code to
df1 %>%
group_by(date1,date2, Week) %>%
select(D1:DR011) %>%
summarise_all(sum)
or
df1 %>%
group_by(date1,date2, Week) %>%
select(starts_with("D")) %>%
summarise_all(sum)
Related
I'm trying to find periods of 3 days or more where the values are the same. As an example, if January 1st, 2nd, and 3rd all have a value of 2, they should be included - but if January 2nd has a value of 3, then none of them should be.
I've tried a few ways so far but no luck! Any help would be greatly appreciated!
Reprex:
library("dplyr")
#Goal: include all values with values of 2 or less for 5 consecutive days and allow for a "cushion" period of values of 2 to 5 for up to 3 days
data <- data.frame(Date = c("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-04", "2000-01-05", "2000-01-06", "2000-01-07", "2000-01-08", "2000-01-09", "2000-01-10", "2000-01-11", "2000-01-12", "2000-01-13", "2000-01-14", "2000-01-15", "2000-01-16", "2000-01-17", "2000-01-18", "2000-01-19", "2000-01-20", "2000-01-21", "2000-01-22", "2000-01-23", "2000-01-24", "2000-01-25", "2000-01-26", "2000-01-27", "2000-01-28", "2000-01-29", "2000-01-30"),
Value = c(2,2,2,5,2,2,1,0,1,8,7,7,7,5,2,3,4,5,7,2,6,6,6,6,2,0,3,4,0,1))
head(data)
#Goal: values should include dates from 2000-01-01 to 2000-01-03, 2000-01-11 to 2000-01-13, and 2000-01-21 to 2000-01-24
#My attempt so far but it doesn't work
attempt1 <- data %>%
group_by(group_id = as.integer(gl(n(),3,n()))) %>% #3 day chunks
filter(Value == Value) %>% #looking for the values being the same inbetween, but this doesn't work for that
ungroup() %>%
select(-group_id)
head(attempt1)
With rle:
rl <- rle(data$Value)
data[rep(rl$lengths>=3,rl$lengths),]
Date Value
1 2000-01-01 2
2 2000-01-02 2
3 2000-01-03 2
11 2000-01-11 7
12 2000-01-12 7
13 2000-01-13 7
21 2000-01-21 6
22 2000-01-22 6
23 2000-01-23 6
24 2000-01-24 6
or with dplyr:
library(dplyr)
data %>% filter(rep(rle(Value)$length>=3,rle(Value)$length))
Date Value
1 2000-01-01 2
2 2000-01-02 2
3 2000-01-03 2
4 2000-01-11 7
5 2000-01-12 7
6 2000-01-13 7
7 2000-01-21 6
8 2000-01-22 6
9 2000-01-23 6
10 2000-01-24 6
You can create a temporary variable using rleid from the data.table package.
data %>%
group_by(data.table::rleid(Value)) %>%
filter(n() >= 3) %>%
ungroup() %>%
select(Date, Value)
#> # A tibble: 10 x 2
#> Date Value
#> <chr> <dbl>
#> 1 2000-01-01 2
#> 2 2000-01-02 2
#> 3 2000-01-03 2
#> 4 2000-01-11 7
#> 5 2000-01-12 7
#> 6 2000-01-13 7
#> 7 2000-01-21 6
#> 8 2000-01-22 6
#> 9 2000-01-23 6
#> 10 2000-01-24 6
Or, if you want to avoid using another package, you could equivalently do
data %>%
group_by(temp = cumsum(c(1, diff(Value) != 0))) %>%
filter(n() > 2) %>%
ungroup() %>%
select(-temp)
#> # A tibble: 10 x 2
#> Date Value
#> <chr> <dbl>
#> 1 2000-01-01 2
#> 2 2000-01-02 2
#> 3 2000-01-03 2
#> 4 2000-01-11 7
#> 5 2000-01-12 7
#> 6 2000-01-13 7
#> 7 2000-01-21 6
#> 8 2000-01-22 6
#> 9 2000-01-23 6
#> 10 2000-01-24 6
Created on 2022-09-12 with reprex v2.0.2
The code below generates a scatter plot with three horizontal lines, which refer to mean, mean+standard deviation and mean - standard deviation. To calculate these three factors, all the dates in my data database are being considered.
However, I would like to exclude the month of April for calculating the mean and standard deviation, how could I do that?
Executable code below:
library(dplyr)
library(tidyr)
library(lubridate)
data <- structure(
list(Id=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1),
date1 = c("2021-06-20","2021-06-20","2021-06-20","2021-06-20","2021-06-20",
"2021-06-20","2021-06-20","2021-06-20","2021-06-20","2021-06-20","2021-06-20",
"2021-06-20","2021-06-20","2021-06-20","2021-06-20","2021-06-20","2021-06-20",
"2021-06-20","2021-06-20","2021-06-20","2021-06-20"),
date2 = c("2021-07-01","2021-07-01","2021-07-01","2021-07-01","2021-04-02",
"2021-04-02","2021-04-02","2021-04-02","2021-04-02","2021-04-02","2021-04-03",
"2021-04-03","2021-04-03","2021-04-03","2021-04-03","2021-04-08","2021-04-08",
"2021-07-09","2021-07-09","2021-07-10","2021-07-10"),
Week= c("Thursday","Thursday","Thursday","Thursday","Friday","Friday","Friday","Friday",
"Friday","Friday","Saturday","Saturday","Saturday","Saturday","Saturday","Thursday",
"Thursday","Friday","Friday","Saturday","Saturday"),
DTPE = c("Ho","Ho","Ho","Ho","","","","","","","","","","","","","","","","Ho","Ho"),
D1 = c(8,1,9, 3,5,4,7,6,3,8,2,3,4,6,7,8,4,2,6,2,3), DR01 = c(4,1,4,3,3,4,3,6,3,7,2,3,4,6,7,8,4,2,6,7,3),
DR02 = c(8,1,4,3,3,4,1,6,3,7,2,3,4,6,7,8,4,2,6,2,3), DR03 = c(7,5,4,3,3,4,1,5,3,3,2,3,4,6,7,8,4,2,6,4,3),
DR04= c(4,5,6,7,3,2,7,4,2,1,2,3,4,6,7,8,4,2,6,4,3),DR05 = c(9,5,4,3,3,2,1,5,3,7,2,3,4,7,7,8,4,2,6,4,3)),
class = "data.frame", row.names = c(NA, -21L))
graph <- function(dt, dta = data) {
dim_data<-dim(data)
day<-c(seq.Date(from = as.Date(data$date2[1]), by = "days",
length = dim_data[1]
))
data_grouped <- data %>%
mutate(across(starts_with("date"), as.Date)) %>%
group_by(date2) %>%
summarise(Id = first(Id),
date1 = first(date1),
Week = first(Week),
DTPE = first(DTPE),
D1 = sum(D1)) %>%
select(Id,date1,date2,Week,DTPE,D1)
data_grouped %>%
mutate(DTPE = na_if(DTPE, ""))
df_OC<-subset(data_grouped, DTPE == "")
ds_CO = df_OC %>% filter(weekdays(date2) %in% weekdays(as.Date(dt)))
mean<-mean(ds_CO$D1)
sd<-sd(ds_CO$D1)
dta %>%
filter(date2 == ymd(dt)) %>%
summarize(across(starts_with("DR"), sum)) %>%
pivot_longer(everything(), names_pattern = "DR(.+)", values_to = "val") %>%
mutate(name = as.numeric(name)) %>%
plot(xlab = "Days", ylab = "Number", xlim = c(0, 45),cex=1.5,cex.lab=1.5,
cex.axis=1.5, cex.main=2, cex.sub=2, lwd=2.5, ylim = c((min(.$val) %/% 10) * 15, (max(.$val) %/% 10 + 1) * 100))
abline(h=mean, col='blue') +
abline(h=(mean + sd), col='green',lty=2)
abline(h=(mean - sd), col='orange',lty=2)
}
graph("2021-07-10",data)
data %>%
filter("04" != format(as.Date(date2), format = "%m"))
# Id date1 date2 Week DTPE D1 DR01 DR02 DR03 DR04 DR05
# 1 1 2021-06-20 2021-07-01 Thursday Ho 8 4 8 7 4 9
# 2 1 2021-06-20 2021-07-01 Thursday Ho 1 1 1 5 5 5
# 3 1 2021-06-20 2021-07-01 Thursday Ho 9 4 4 4 6 4
# 4 1 2021-06-20 2021-07-01 Thursday Ho 3 3 3 3 7 3
# 5 1 2021-06-20 2021-07-09 Friday 2 2 2 2 2 2
# 6 1 2021-06-20 2021-07-09 Friday 6 6 6 6 6 6
# 7 1 2021-06-20 2021-07-10 Saturday Ho 2 7 2 4 4 4
# 8 1 2021-06-20 2021-07-10 Saturday Ho 3 3 3 3 3 3
(I recommend you permanently make date1 and date2 proper Date objects in the frame instead of converting it every time you do something. While the conversion is relatively inexpensive, it's also unnecessary, and the consequence of forgetting it might be subtle differences in the results (i.e., treating it as a categorical variable vice continuous/discrete-ordinal).
You already use lubridate therefore you could apply month function from lubridate package:
data %>%
filter(month(date2) != 4)
Id date1 date2 Week DTPE D1 DR01 DR02 DR03 DR04 DR05
1 1 2021-06-20 2021-07-01 Thursday Ho 8 4 8 7 4 9
2 1 2021-06-20 2021-07-01 Thursday Ho 1 1 1 5 5 5
3 1 2021-06-20 2021-07-01 Thursday Ho 9 4 4 4 6 4
4 1 2021-06-20 2021-07-01 Thursday Ho 3 3 3 3 7 3
5 1 2021-06-20 2021-07-09 Friday 2 2 2 2 2 2
6 1 2021-06-20 2021-07-09 Friday 6 6 6 6 6 6
7 1 2021-06-20 2021-07-10 Saturday Ho 2 7 2 4 4 4
8 1 2021-06-20 2021-07-10 Saturday Ho 3 3 3 3 3 3
Using substr
subset(data, substr(date2, 6, 7 ) != '04')
-ouptut
Id date1 date2 Week DTPE D1 DR01 DR02 DR03 DR04 DR05
1 1 2021-06-20 2021-07-01 Thursday Ho 8 4 8 7 4 9
2 1 2021-06-20 2021-07-01 Thursday Ho 1 1 1 5 5 5
3 1 2021-06-20 2021-07-01 Thursday Ho 9 4 4 4 6 4
4 1 2021-06-20 2021-07-01 Thursday Ho 3 3 3 3 7 3
18 1 2021-06-20 2021-07-09 Friday 2 2 2 2 2 2
19 1 2021-06-20 2021-07-09 Friday 6 6 6 6 6 6
20 1 2021-06-20 2021-07-10 Saturday Ho 2 7 2 4 4 4
21 1 2021-06-20 2021-07-10 Saturday Ho 3 3 3 3 3 3
I would like to calculate the median per weekday of my PV variable. In other words, looking at the variable PV, it can be seen, for example, that it has three Fridays, that is, the calculation of the median will consider data from these three Fridays.
Thanks!
library(dplyr)
df <- structure(
list(Id=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1),
date1 = c("2021-07-01","2021-07-01","2021-07-01","2021-07-01","2021-04-02",
"2021-04-02","2021-04-02","2021-04-02","2021-04-02","2021-04-02","2021-04-03",
"2021-04-03","2021-04-03","2021-04-03","2021-04-03","2021-04-08","2021-04-08",
"2021-04-07","2021-04-09","2021-04-10","2021-04-10"),
Week= c("Thursday","Thursday","Thursday","Thursday","Friday","Friday","Friday","Friday",
"Friday","Friday","Saturday","Saturday","Saturday","Saturday","Saturday","Thursday",
"Thursday","Friday","Friday","Saturday","Saturday"),
DTPE = c("Ho","Ho","Ho","Ho","","","","","","","","","","","","","","","","Ho","Ho"),
D1 = c(8,1,9, 3,5,4,7,6,3,8,2,3,4,6,7,8,8,6,16,2,3), DR01 = c(4,1,4,3,3,4,3,6,3,7,2,3,4,6,7,8,9,2,6,7,3),
DR02 = c(4,1,4,3,3,4,1,6,3,7,6,6,4,6,7,8,4,2,6,2,3), DR03 = c(7,5,4,3,6,4,1,5,3,6,2,3,4,9,7,8,4,2,6,4,3),
DR04= c(9,5,6,7,3,2,7,4,2,1,5,3,4,6,7,8,4,7,7,4,3),DR05 = c(9,5,4,3,3,7,1,5,3,7,2,3,4,7,7,8,4,2,6,4,3)),
class = "data.frame", row.names = c(NA, -21L))
df<-df %>%
group_by(Id, date1, Week) %>%
select(D1:DR05) %>%
summarise_all(sum)
x<-subset(df, select = DR01:DR05)
x<-cbind(df, setNames(df$D1 - x, paste0(names(x), "_PV")))
PV<-select(x, date1, Week,ends_with("PV"))
PV
Id date1 Week DR01_PV DR02_PV DR03_PV DR04_PV DR05_PV
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2021-04-02 Friday 7 9 8 14 7
2 1 2021-04-03 Saturday 0 -7 -3 -3 -1
3 1 2021-04-07 Friday 4 4 4 -1 4
4 1 2021-04-08 Thursday -1 4 4 4 4
5 1 2021-04-09 Friday 10 10 10 9 10
6 1 2021-04-10 Saturday -5 0 -2 -2 -2
7 1 2021-07-01 Thursday 9 9 2 -6 0
x %>%
group_by(Week) %>%
summarize(across(ends_with("_PV"), median))
# # A tibble: 3 x 6
# Week DR01_PV DR02_PV DR03_PV DR04_PV DR05_PV
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 Friday 7 9 8 9 7
# 2 Saturday -2.5 -3.5 -2.5 -2.5 -1.5
# 3 Thursday 4 6.5 3 -1 2
If you want to combine all columns, one way is
PV %>%
ungroup() %>%
select(Week, ends_with("PV")) %>%
tidyr::pivot_longer(-Week) %>%
group_by(Week) %>%
summarize(Med = median(value))
# # A tibble: 3 x 2
# Week Med
# <chr> <dbl>
# 1 Friday 8
# 2 Saturday -2
# 3 Thursday 4
This question already has answers here:
Subsetting a dataframe for a specified month and year
(3 answers)
subset function with "different than"?
(3 answers)
Closed 1 year ago.
How do I delete the April dates that are in the date2 column? Here is a small example, but I have a much larger database. So, would I be able to do this quickly?
Thanks!
data <- structure(
list(Id=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1),
date1 = c("2021-06-20","2021-06-20","2021-06-20","2021-06-20","2021-06-20",
"2021-06-20","2021-06-20","2021-06-20","2021-06-20","2021-06-20","2021-06-20",
"2021-06-20","2021-06-20","2021-06-20","2021-06-20","2021-06-20","2021-06-20",
"2021-06-20","2021-06-20","2021-06-20","2021-06-20"),
date2 = c("2021-07-01","2021-07-01","2021-07-01","2021-07-01","2021-04-02",
"2021-04-02","2021-06-02","2021-04-02","2021-04-02","2021-04-02","2021-04-03",
"2021-05-03","2021-06-03","2021-04-03","2021-04-03","2021-04-08","2021-04-08",
"2021-06-09","2021-05-09","2021-08-10","2021-06-10"),
DR01= c(4,5,6,7,3,2,7,4,2,1,2,3,4,6,7,8,4,2,6,4,3),DR02 = c(9,5,4,3,3,2,1,5,3,7,2,3,4,7,7,8,4,2,6,4,3)),
class = "data.frame", row.names = c(NA, -21L))
We could use month function in lubridate and then filter:
library(dplyr)
library(lubridate)
data %>%
filter(month(date2)!=4)
Id date1 date2 DR01 DR02
1 1 2021-06-20 2021-07-01 4 9
2 1 2021-06-20 2021-07-01 5 5
3 1 2021-06-20 2021-07-01 6 4
4 1 2021-06-20 2021-07-01 7 3
5 1 2021-06-20 2021-06-02 7 1
6 1 2021-06-20 2021-05-03 3 3
7 1 2021-06-20 2021-06-03 4 4
8 1 2021-06-20 2021-06-09 2 2
9 1 2021-06-20 2021-05-09 6 6
10 1 2021-06-20 2021-08-10 4 4
11 1 2021-06-20 2021-06-10 3 3
Extract the month part after converting to Date class and use !=
data2 <- subset(data, format(as.Date(date2), '%m') != '04')
-output
data2
Id date1 date2 DR01 DR02
1 1 2021-06-20 2021-07-01 4 9
2 1 2021-06-20 2021-07-01 5 5
3 1 2021-06-20 2021-07-01 6 4
4 1 2021-06-20 2021-07-01 7 3
7 1 2021-06-20 2021-06-02 7 1
12 1 2021-06-20 2021-05-03 3 3
13 1 2021-06-20 2021-06-03 4 4
18 1 2021-06-20 2021-06-09 2 2
19 1 2021-06-20 2021-05-09 6 6
20 1 2021-06-20 2021-08-10 4 4
21 1 2021-06-20 2021-06-10 3 3
Another option without using any dates:
data[!grepl("-04-", data$date2), ]
We interprete date2 as string and look for any cell without a "-04-". This returns
Id date1 date2 DR01 DR02
1 1 2021-06-20 2021-07-01 4 9
2 1 2021-06-20 2021-07-01 5 5
3 1 2021-06-20 2021-07-01 6 4
4 1 2021-06-20 2021-07-01 7 3
7 1 2021-06-20 2021-06-02 7 1
12 1 2021-06-20 2021-05-03 3 3
13 1 2021-06-20 2021-06-03 4 4
18 1 2021-06-20 2021-06-09 2 2
19 1 2021-06-20 2021-05-09 6 6
20 1 2021-06-20 2021-08-10 4 4
21 1 2021-06-20 2021-06-10 3 3
I would like to create a new database from the df database I entered below. My idea is to create a base where only have one day per line. For example, instead of inserting 4 rows for 01/07/2021, it will only be 1, this way the values of the columns of those days will be added.
df <- structure(
list(Id=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1),
date1 = c("2021-07-01","2021-07-01","2021-07-01","2021-07-01","2021-04-02",
"2021-04-02","2021-04-02","2021-04-02","2021-04-02","2021-04-02","2021-04-03",
"2021-04-03","2021-04-03","2021-04-03","2021-04-03","2021-04-08","2021-04-08",
"2021-04-07","2021-04-09","2021-04-10","2021-04-10"),
Week= c("Thursday","Thursday","Thursday","Thursday","Friday","Friday","Friday","Friday",
"Friday","Friday","Saturday","Saturday","Saturday","Saturday","Saturday","Thursday",
"Thursday","Friday","Friday","Saturday","Saturday"),
DTPE = c("Ho","Ho","Ho","Ho","","","","","","","","","","","","","","","","Ho","Ho"),
D1 = c(8,1,9, 3,5,4,7,6,3,8,2,3,4,6,7,8,4,2,6,2,3), DR01 = c(4,1,4,3,3,4,3,6,3,7,2,3,4,6,7,8,4,2,6,7,3),
DR02 = c(8,1,4,3,3,4,1,6,3,7,2,3,4,6,7,8,4,2,6,2,3), DR03 = c(7,5,4,3,3,4,1,5,3,3,2,3,4,6,7,8,4,2,6,4,3),
DR04= c(4,5,6,7,3,2,7,4,2,1,2,3,4,6,7,8,4,2,6,4,3),DR05 = c(9,5,4,3,3,2,1,5,3,7,2,3,4,7,7,8,4,2,6,4,3)),
class = "data.frame", row.names = c(NA, -21L))
> df
Id date1 Week DTPE D1 DR01 DR02 DR03 DR04 DR05
1 1 2021-07-01 Thursday Ho 8 4 8 7 4 9
2 1 2021-07-01 Thursday Ho 1 1 1 5 5 5
3 1 2021-07-01 Thursday Ho 9 4 4 4 6 4
4 1 2021-07-01 Thursday Ho 3 3 3 3 7 3
5 1 2021-04-02 Friday 5 3 3 3 3 3
6 1 2021-04-02 Friday 4 4 4 4 2 2
7 1 2021-04-02 Friday 7 3 1 1 7 1
8 1 2021-04-02 Friday 6 6 6 5 4 5
9 1 2021-04-02 Friday 3 3 3 3 2 3
10 1 2021-04-02 Friday 8 7 7 3 1 7
11 1 2021-04-03 Saturday 2 2 2 2 2 2
12 1 2021-04-03 Saturday 3 3 3 3 3 3
13 1 2021-04-03 Saturday 4 4 4 4 4 4
14 1 2021-04-03 Saturday 6 6 6 6 6 7
15 1 2021-04-03 Saturday 7 7 7 7 7 7
16 1 2021-04-08 Thursday 8 8 8 8 8 8
17 1 2021-04-08 Thursday 4 4 4 4 4 4
18 1 2021-04-07 Friday 2 2 2 2 2 2
19 1 2021-04-09 Friday 6 6 6 6 6 6
20 1 2021-04-10 Saturday Ho 2 7 2 4 4 4
21 1 2021-04-10 Saturday Ho 3 3 3 3 3 3
We may do a grouping by 'Id', along with 'date1' and 'Week', then summarise the numeric columns to get the sum in across
library(dplyr)
df %>% group_by(Id, date1, Week) %>%
summarise(across(where(is.numeric), sum, na.rm = TRUE), .groups = 'drop')
You can perform this using the following code:
library(dplyr)
df %>%
group_by(Id, date1, Week) %>%
select(D1:DR05) %>%
summarise_all(sum)
# A tibble: 7 × 9
# Groups: Id, date1 [7]
Id date1 Week D1 DR01 DR02 DR03 DR04 DR05
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2021-04-02 Friday 33 26 24 19 19 21
2 1 2021-04-03 Saturday 22 22 22 22 22 23
3 1 2021-04-07 Friday 2 2 2 2 2 2
4 1 2021-04-08 Thursday 12 12 12 12 12 12
5 1 2021-04-09 Friday 6 6 6 6 6 6
6 1 2021-04-10 Saturday 5 10 5 7 7 7
7 1 2021-07-01 Thursday 21 12 16 19 22 21
You might want to also convert the date1 field to a DATE object, but can do that using the lubridate verbs for e.g. ymd() inside a mutate
Base R with aggregate:
aggregate(cbind(D1, DR01, DR02, DR03, DR04, DR05) ~ Id+date1+Week, df, sum)
Output:
Id date1 Week D1 DR01 DR02 DR03 DR04 DR05
1 1 2021-04-02 Friday 33 26 24 19 19 21
2 1 2021-04-07 Friday 2 2 2 2 2 2
3 1 2021-04-09 Friday 6 6 6 6 6 6
4 1 2021-04-03 Saturday 22 22 22 22 22 23
5 1 2021-04-10 Saturday 5 10 5 7 7 7
6 1 2021-04-08 Thursday 12 12 12 12 12 12
7 1 2021-07-01 Thursday 21 12 16 19 22 21