How do I apply a function to many columns of grouped rows? For example;
library(tidyverse)
data <- tribble(
~Date, ~Seq1, ~Component, ~Seq2, ~X1, ~X2, ~X3,
"01/01/18", 1, "Smooth", NA, 3.98, 2.75, 1.82,
"01/01/18", 2, "Smooth", NA, 1.02, 0.02, -0.04,
"01/01/18", 3, "Smooth", NA, 3.48, 3.06, 1.25,
"01/01/18", 3, "Bounce", 1, 2.01, -0.43, -0.52,
"01/01/18", 3, "Bounce", 2, 1.94, 1.53, 1.92) %>%
mutate_at(vars(Date, Seq1, Component, Seq2), funs(factor))
Each column of X values (many more columns, truncated here for clarity) is grouped into Date, Seq1, Component, and Seq2. While Component "Smooth" and Seq1 "NA" are constant, within Component "Bounce" level there are multiple Seq2 levels e.g. "1", "2", etc.
How do I sum each X column, always the constant "NA" with each level of Seq2?
The desired results is:
expected <- tribble(
~Date, ~Seq1, ~Component, ~Seq2, ~X1, ~X2, ~X3,
"01/01/18", 1, "Smooth", NA, 3.98, 2.75, 1.82,
"01/01/18", 2, "Smooth", NA, 1.02, 0.02, -0.04,
"01/01/18", 3, "Smooth", NA, 3.48, 3.06, 1.25,
"01/01/18", 3, "Bounce", 1, 5.49, 3.49, 1.77,
"01/01/18", 3, "Bounce", 2, 5.42, 4.59, 3.17)
The following example only adds each Seq1 level.
data %>%
group_by(Date, Seq1) %>%
mutate_at(vars(starts_with("X")), funs(sum(.)))
#> # A tibble: 5 x 7
#> # Groups: Date, Seq1 [3]
#> Date Seq1 Component Seq2 X1 X2 X3
#> <fct> <fct> <fct> <fct> <dbl> <dbl> <dbl>
#> 1 01/01/18 1 Smooth <NA> 3.98 2.75 1.82
#> 2 01/01/18 2 Smooth <NA> 1.02 0.02 -0.04
#> 3 01/01/18 3 Smooth <NA> 7.43 4.16 2.65
#> 4 01/01/18 3 Bounce 1 7.43 4.16 2.65
#> 5 01/01/18 3 Bounce 2 7.43 4.16 2.65
I am certain there is solution within the purrr or apply function family, however, I have been unsuccessful (for days) in solving this example. The actual data has about 180 X columns, with hundreds of Date and Seq1 combinations, and multiple Seq2 levels.
A similar example could be Summing Multiple Groups of Columns, How to apply a function to a subset of columns in r?, or even perhaps https://github.com/jennybc/row-oriented-workflows.
Created on 2018-10-23 by the reprex package (v0.2.1)
Here's my solution. This problem is not really a purrr task, because there is nothing really that you want to map a single function to. Instead, what I understand the problem to be is that you want to match each X value in a Bounce row with the corresponding Smooth row X values of the same Date and Seq1 (and there is only one such row). This means that it is really a merging or joining problem, and then the approach is to set up the join so that you can match the right values and do the sum. So I go as follows:
Split the data into the Smooth rows and the Bounce rows and gather so that all the X values are in one column
Join the smooths onto the bounces with a left_join, so each original Bounce row now has its corresponding Smooth.
mutate the sum into a new column and select/rename the columns to be as in the original
bind_rows to join the newly summed bounces and spread to return to the original layout.
This should be robust to any number of Date, Seq1, Seq2 and X values.
library(tidyverse)
data <- tribble(
~Date, ~Seq1, ~Component, ~Seq2, ~X1, ~X2, ~X3,
"01/01/18", 1, "Smooth", NA, 3.98, 2.75, 1.82,
"01/01/18", 2, "Smooth", NA, 1.02, 0.02, -0.04,
"01/01/18", 3, "Smooth", NA, 3.48, 3.06, 1.25,
"01/01/18", 3, "Bounce", 1, 2.01, -0.43, -0.52,
"01/01/18", 3, "Bounce", 2, 1.94, 1.53, 1.92)
smooths <- data %>%
filter(Component == "Smooth") %>%
gather(X, val, starts_with("X"))
bounces <- data %>%
filter(Component == "Bounce") %>%
gather(X, val, starts_with("X")) %>%
left_join(smooths, by = c("Date", "Seq1", "X")) %>%
mutate(val = val.x + val.y) %>%
select(Date, Seq1, Component = Component.x, Seq2 = Seq2.x, X, val)
bounces %>%
bind_rows(smooths) %>%
spread(X, val)
#> # A tibble: 5 x 7
#> Date Seq1 Component Seq2 X1 X2 X3
#> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 01/01/18 1 Smooth NA 3.98 2.75 1.82
#> 2 01/01/18 2 Smooth NA 1.02 0.02 -0.04
#> 3 01/01/18 3 Bounce 1 5.49 2.63 0.73
#> 4 01/01/18 3 Bounce 2 5.42 4.59 3.17
#> 5 01/01/18 3 Smooth NA 3.48 3.06 1.25
Created on 2018-10-31 by the reprex package (v0.2.1)
Related
Ok, I have been trying to get an answer for this but I cant find it anywhere, but it seems like an easy task (which is bugging me even more!)
I have a dataframe with a series of numbers in a column which I want to filter to get the first occurrence of a number....for example, if i have 1.01, 1.08, 1.15, I want to filter the rows to get the row with the value 1.01 in that column.
An examples is:
x<- c(2.04, 2.25, 3.99, 3.20, 2.60, 1.85, 3.57, 3.37, 2.59, 1.60, 3.93, 1.33, 1.08, 4.64, 2.09, 4.53, 3.04, 3.85, 3.15, 3.97)
y<- c(2.62, 2.48, 1.40, 2.27, 3.71, 1.86, 3.56, 2.08, 2.36, 3.23, 1.65, 3.43, 1.57, 4.49, 2.29, 3.32, 2.12, 4.45, 1.57, 4.70)
z <- data.frame(x, y)
z <- z[order(z$x, decreasing = FALSE), ]
And the filtered results should be:
x y
1.08 1.57
2.04 2.62
3.04 2.12
4.53 3.32
Any help would be apprreciated
z %>%
arrange(x) %>%
group_by(int = floor(x)) %>%
slice(1) %>%
ungroup()
# A tibble: 4 × 3
x y int
<dbl> <dbl> <dbl>
1 1.08 1.57 1
2 2.04 2.62 2
3 3.04 2.12 3
4 4.53 3.32 4
or
z %>%
arrange(x) %>%
filter(floor(x) != lag(floor(x), default = 0))
x y
1 1.08 1.57
2 2.04 2.62
3 3.04 2.12
4 4.53 3.32
You can also try this:
z1 <- z %>%
group_by(floor(z$x)) %>%
arrange(z$x) %>%
filter(row_number()==1)
z1
# A tibble: 4 × 3
# Groups: floor(z$x) [4]
x y `floor(z$x)`
<dbl> <dbl> <dbl>
1 1.08 1.57 1
2 2.04 2.62 2
3 3.04 2.12 3
4 4.53 3.32 4
I'm trying to split a data frame from long to wide format by converting selected rows to columns. Here is the current general long-format structure:
data_long <- data.frame(
id = c("kelp","kelp","fish","fish","beach","beach","kelp","kelp","fish","fish","beach","beach"),
desig = c("mpa","reference","mpa","reference","mpa","reference","mpa","reference","mpa","reference","mpa","reference"),
indicator = c("density","density","density","density","density","density","biomass","biomass","biomass","biomass","biomass","biomass"),
n = c(1118,1118,1118,1118,1118,1118,1118,1118,1118,1118,1118,1118),
m = c(0.35, 4.28, 1.16, 106.35, 13.44,0.63,0.35, 4.28, 1.16, 106.35, 13.44,0.63),
sd = c(1.19, 8.48, 4.25, 118, 31.77,2.79,1.19, 8.48, 4.25, 118, 31.77,2.79)
)
data_long
I want to keep id and indicator, split by "desig",and move "n", "m", and "sd" into new columns. The final data frame structure I'm trying to obtain is:
data_wide <- data.frame(
id = c("kelp","fish","beach","kelp","fish","beach"),
indicator = c("density","density","density","biomass","biomass","biomass"),
mpa.n = c(1118,1118,1118,1118,1118,1118),
mpa.m = c(0.35, 4.28, 1.16, 106.35, 13.44,0.63),
mpa.sd = c(1.19, 8.48, 4.25, 118, 31.77,2.79),
reference.n = c(1118,1118,1118,1118,1118,1118),
reference.m = c(0.35, 4.28, 1.16, 106.35, 13.44,0.63),
reference.sd = c(1.19, 8.48, 4.25, 118, 31.77,2.79)
)
data_wide
I can't seem to get this right using reshape2. Any suggestions?
We may use pivot_wider
library(tidyr)
library(dplyr)
pivot_wider(data_long, names_from = desig,
values_from = c(n, m, sd), names_glue = "{desig}.{.value}") %>%
select(id, indicator, starts_with("mpa"), starts_with('reference'))
-output
# A tibble: 6 × 8
id indicator mpa.n mpa.m mpa.sd reference.n reference.m reference.sd
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 kelp density 1118 0.35 1.19 1118 4.28 8.48
2 fish density 1118 1.16 4.25 1118 106. 118
3 beach density 1118 13.4 31.8 1118 0.63 2.79
4 kelp biomass 1118 0.35 1.19 1118 4.28 8.48
5 fish biomass 1118 1.16 4.25 1118 106. 118
6 beach biomass 1118 13.4 31.8 1118 0.63 2.79
I want to compute the mean exposure to se ozone from a dataset with the example below. The mean value should be the ozone value from the year of birth to age 5. Is there a simple way to do this in R.
final = data.frame(ID = c(1, 2, 3, 4, 5, 6),
Zone = c("A", "B", "C", "D", "A", "B"),
dob = c(1993, 1997, 1994, 2001, 1999, 1993),
Ozone_1993 = c(0.12, 0.01, 0.36, 0.78, 0.12, 0.01),
Ozone_1994 = c(0.75, 0.23, 0.14, 0.98, 0.75, 0.23),
Ozone_1995 = c(1.38, 0.45, -0.08, 1.18, 1.38, 0.45),
Ozone_1996 = c(2.01, 0.67, -0.3, 1.38, 2.01, 0.67),
Ozone_1997 = c(2.64, 0.89, -0.52, 1.58, 2.64, 0.89),
Ozone_1998 = c(3.27, 1.11, -0.74, 1.78, 3.27, 1.11),
Ozone_1999 = c(3.9, 1.33, -0.96, 1.98, 3.9, 1.33),
Ozone_2000 = c(4.53, 1.55, -1.18, 2.18, 4.53, 1.55),
Ozone_2001 = c(5.16, 1.77, -1.4, 2.38, 5.16, 1.77),
Ozone_2002 = c(5.79, 1.99, -1.62, 2.58, 5.79, 1.99),
Ozone_2003 = c(6.42, 2.21, -1.84, 2.78, 6.42, 2.21),
Ozone_2004 = c(7.05, 2.43, -2.06, 2.98, 7.05, 2.43),
mean_under5_ozone = c(0.85, 1.33, -0.3, 2.68, 5.16, 0.45))
where column (variable) mean_under5_ozone is the mean score of Ozone exposure from birthyear to age 5 or less. e.g mean_under5_ozone for ID 1 is the rowmean from Ozone_1993 to Ozone_1997
From a novice,
Here is one way to do it with for loops. (It's not very elegant, but it avoids getting into too much details of dplyr and rlang syntax.)
loop over birth years (dob_yr below) to define a column containing variable names to use for the custom mean (use_vars below).
loop over rows and for each row, extract relevant variables using this new column (use_vars) and calculate the custom mean.
library(dplyr)
df <- tibble(id=1:5)
df$zone <- c(rep('A', 5))
df$dob_yr <- c(1991:1995)
for (yr in 1991:1995) {
df[[paste('x_',yr,sep='')]] <- c(abs(rnorm(5)))
}
df # check mock data
add_use_vars <- function(df, dob_yr_varname='dob_yr', prefix='x_', yr_within=3) {
vars <- names(df %>% select(starts_with(prefix)))
vars_yr <- as.integer(sub(prefix, '', vars))
df$use_vars <- NA
for (i in seq_along(df[[dob_yr_varname]])) {
yr <- df[[dob_yr_varname]][i]
idx <- (vars_yr <= yr + yr_within) & (vars_yr >= yr)
df$use_vars[i] <- list(vars[idx]) # list with one element
}
return(df)
}
df <- add_use_vars(df)
df$use_vars[1][[1]] # see the first row in use_vars
custom_mean <- function(df, varname_varlist='use_vars') {
df$custom_mean <- NA
for (i in seq_along(df[[varname_varlist]])) {
vars = df[[varname_varlist]][i][[1]] # extract first element in list
df$custom_mean[i] <- mean(as.numeric(df[i, vars]))
}
return(df)
}
df <- custom_mean(df)
df # see results
Note that for this mock data, for each row, I am averaging over the columns containing value of 0 to 3 years from the birth year.
(Complete rewrite.)
I don't think I understand what mean_under5_ozone means, since I can't reproduce your numbers. For instance, for ID==1, born in 1993, that means we want data from 1993 through 1998 (to include age 5) or 1997 (up to but not including), but neither of those averages is 0.85:
mean(unlist(final[1, 4:9]))
# [1] 1.695
mean(unlist(final[1, 4:8]))
# [1] 1.38
Ignoring this, I'll give you what I think are the correct answers with your final data.
tidyverse
library(dplyr)
library(tidyr) # pivot_longer
final <- select(final, -mean_under5_ozone)
final %>%
pivot_longer(starts_with("Ozone"), names_pattern = "(.*)_(.*)", names_to = c("type", "year")) %>%
mutate(year = as.integer(year)) %>%
group_by(ID) %>%
summarize(mean_under5_ozone = mean(value[ between(year, dob, dob + 5) ]), .groups = "drop")
# # A tibble: 6 x 2
# ID mean_under5_ozone
# <dbl> <dbl>
# 1 1 1.70
# 2 2 1.44
# 3 3 -0.41
# 4 4 2.68
# 5 5 5.48
# 6 6 0.56
data.table
library(data.table)
library(magrittr) # %>%, not required but used for improved readability
finalDT[, mean_under5_ozone := NULL]
melt(finalDT, 1:3) %>%
.[, year := as.integer(gsub("[^0-9]", "", variable))] %>%
.[ year >= dob, ] %>%
.[, .(mean_under5_ozone = mean(value[ between(year, dob, dob + 5) ])), by = .(ID)] %>%
.[order(ID),]
# ID mean_under5_ozone
# 1: 1 1.695
# 2: 2 1.440
# 3: 3 -0.410
# 4: 4 2.680
# 5: 5 5.475
# 6: 6 0.560
A few thoughts, using random data.
set.seed(42)
dat <- data.frame(dob = sample(1990:2020, size=1000, replace=TRUE), Ozone_1993=runif(1000), Ozone_1994=runif(1000), Ozone_1995=runif(1000))
head(dat)
# dob Ozone_1993 Ozone_1994 Ozone_1995
# 1 2006 0.37383448 0.68624969 0.1681480
# 2 1994 0.46496563 0.29309851 0.8198724
# 3 1990 0.04660819 0.41994895 0.7501070
# 4 2014 0.98751620 0.73526105 0.2899959
# 5 1999 0.90845233 0.84982125 0.1798130
# 6 1993 0.97939015 0.07746459 0.6172919
tidyverse
library(dplyr)
dat %>%
filter(dob >= 2015) %>%
summarize_at(vars(starts_with("Ozone")), mean)
# Ozone_1993 Ozone_1994 Ozone_1995
# 1 0.5242029 0.4852803 0.4864364
That is the average per year. If you instead need a single statistic, then
# library(tidyr) # pivot_longer
dat %>%
filter(dob >= 2015) %>%
tidyr::pivot_longer(starts_with("Ozone")) %>%
summarize(value = mean(value))
# # A tibble: 1 x 1
# value
# <dbl>
# 1 0.499
data.table
library(data.table)
datDT <- as.data.table(dat)
datDT[ dob >= 2015, ][, lapply(.SD, mean), .SDcols = patterns("^Ozone")]
# Ozone_1993 Ozone_1994 Ozone_1995
# 1: 0.5242029 0.4852803 0.4864364
melt(datDT[ dob >= 2015, ], "dob")[, .(value = mean(value))]
# value
# 1: 0.4986398
Base R
apply(subset(dat, dob >= 2015, select = Ozone_1993:Ozone_1995), 2, mean)
# Ozone_1993 Ozone_1994 Ozone_1995
# 0.5242029 0.4852803 0.4864364
mean(unlist(subset(dat, dob >= 2015, select = Ozone_1993:Ozone_1995)))
# [1] 0.4986398
Using R, I have two line graphs each in their own chart. I need them both on the same chart. I have looked at other stackoverflow inquiries but did not find one that matches my need.
In my two-line chart I need graphs for BA and careerBA on the vertical axis and the G field as the horizontal axis. Each G value (e.g., 1) has a matching BA and careerBA.
BA <- c(0.317, 0.298, 0.273, 0.280, 0.252, 0.204, 0.181, 0.241, 0.227, 0.233, 0.080, 0.285)
careerBA <- c(0.279, 0.280, 0.245, 0.253, 0.276, 0.247, 0.265, 0.243, 0.274, 0.255, 0.236, 0.287)
G <- c(1,2,3,4,5,6,7,8,9,10,11,12)
df <- data.frame(BA, careerBA, G)
df
library(ggplot2)
p1 <- ggplot() + geom_line(aes(y = BA, x = G, color = "red"), data = df) + labs(title = "All Mets Age 37 Season", x = "Games", y = "Batting Average", caption = "Age 37 Mets") + scale_x_continuous(breaks=seq(0,160,20))
p1
p2 <- ggplot() + geom_line(aes(y = BAcareer, x = G), data = age37mets, color = "blue") + labs(title = "All Mets Age 37 Career", x = "Games", y = "Career Batting Average", caption = "Age 37 Mets") + scale_x_continuous(breaks=seq(0,160,20))
p2
Here is my dput:
structure(list(BA = c(0.317, 0.298, 0.273, 0.28, 0.252, 0.204,
0.181, 0.241, 0.227, 0.233, 0.08, 0.285), careerBA = c(0.279,
0.28, 0.245, 0.253, 0.276, 0.247, 0.265, 0.243, 0.274, 0.255,
0.236, 0.287), G = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)), class = "data.frame", row.names = c(NA,
-12L))
One solution is to reshape your data into a longer format (here I'm using pivot_longer function from tidyr package):
library(dplyr)
library(tidyr)
df %>% pivot_longer(-G, names_to = "var",values_to = "val")
# A tibble: 24 x 3
G var val
<dbl> <chr> <dbl>
1 1 BA 0.317
2 1 careerBA 0.279
3 2 BA 0.298
4 2 careerBA 0.28
5 3 BA 0.273
6 3 careerBA 0.245
7 4 BA 0.28
8 4 careerBA 0.253
9 5 BA 0.252
10 5 careerBA 0.276
# … with 14 more rows
If you want to add the plotting part, you can wrote:
library(dplyr)
library(tidyr)
library(ggplot2)
library(scales)
df %>% pivot_longer(-G, names_to = "var",values_to = "val") %>%
ggplot(aes(x = G, y = val, color = var))+
geom_line()+
scale_y_continuous(labels = function(x){format(x, nsmall = 3)})
Is it what you are looking for ?
I am a beginner in R and in coding in general.. I have a dataframe that looks like this:
Date Week Spend
1 2019-07-14 2019-07-08 1.81
2 2019-07-13 2019-07-08 1.31
3 2019-07-12 2019-07-08 1.56
4 2019-07-11 2019-07-08 0.45
5 2019-07-10 2019-07-08 5.00
The whole data has several weeks.
First, I will need to group the data by week and sum the values.
For now I tried this:
df$nweek = (rep(1:15, each= 7))
Results:
Date Week Spend nweek
1 2019-07-14 2019-07-08 1.81 1
2 2019-07-13 2019-07-08 1.31 1
3 2019-07-12 2019-07-08 1.56 1
4 2019-07-11 2019-07-08 0.45 1
5 2019-07-10 2019-07-08 5.00 1
6 2019-07-09 2019-07-08 3.59 1
7 2019-07-08 2019-07-08 4.08 1
8 2019-07-07 2019-07-01 2.83 2
9 2019-07-06 2019-07-01 1.38 2
10 2019-07-05 2019-07-01 1.59 2
11 2019-07-04 2019-07-01 0.93 2
12 2019-07-03 2019-07-01 1.50 2
13 2019-07-02 2019-07-01 3.22 2
14 2019-07-01 2019-07-01 6.20 2
15 2019-06-30 2019-06-24 5.47 3
16 2019-06-29 2019-06-24 1.77 3
so that in this way I can have an "id" of each week. However, for some reason I cannot group my dataframe by this sequence of number I just produced:
df = df %>% group_by(nweek) %>%
summarise (Spend = sum(Spend))
Instead, the result only gives me one row and sums the value(Spend) of the whole dataframe.
I tried as.character on the "nweek" column and didnt work
Second,
After grouping the dataframe by week, I'm trying to calculate the mean and standard deviation each week, and return those values to new columns in the data frame. How can I do this?
Thanks
I would make one slight change to Ryan John's great solution. You can use mutate() to modify the Date, Week, and week_num columns all in one pipe.
df <- tibble::tribble(
~Date, ~Week, ~Spend, ~nweek,
"7/14/2019", "7/8/2019", 1.81, 1,
"7/13/2019", "7/8/2019", 1.31, 1,
"7/12/2019", "7/8/2019", 1.56, 1,
"7/11/2019", "7/8/2019", 0.45, 1,
"7/10/2019", "7/8/2019", 5.95, 1,
"7/9/2019", "7/8/2019", 3.59, 1,
"7/8/2019", "7/8/2019", 4.08, 1,
"7/7/2019", "7/1/2019", 2.83, 2,
"7/6/2019", "7/1/2019", 1.38, 2,
"7/5/2019", "7/1/2019", 1.59, 2,
"7/4/2019", "7/1/2019", 0.93, 2,
"7/3/2019", "7/1/2019", 1.5, 2,
"7/2/2019", "7/1/2019", 3.22, 2,
"7/1/2019", "7/1/2019", 6.2, 2,
"6/30/2019", "6/24/2019", 5.47, 3,
"6/29/2019", "6/24/2019", 1.77, 3
)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:lubridate':
#>
#> intersect, setdiff, union
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df %>%
mutate(Date = mdy(Date),
Week = mdy(Week),
week_num = week(Date)) %>%
group_by(week_num) %>%
summarise(spend_sum = sum(Spend),
spend_sd = sd(Spend))
#> # A tibble: 3 x 3
#> week_num spend_sum spend_sd
#> <dbl> <dbl> <dbl>
#> 1 26 13.4 2.38
#> 2 27 15.5 1.16
#> 3 28 14.7 2.00
Created on 2019-07-17 by the reprex package (v0.2.1)
Try this:
library(tibble)
df <- tibble::tribble(
~Date, ~Week, ~Spend, ~nweek,
"7/14/2019", "7/8/2019", 1.81, 1,
"7/13/2019", "7/8/2019", 1.31, 1,
"7/12/2019", "7/8/2019", 1.56, 1,
"7/11/2019", "7/8/2019", 0.45, 1,
"7/10/2019", "7/8/2019", 5.95, 1,
"7/9/2019", "7/8/2019", 3.59, 1,
"7/8/2019", "7/8/2019", 4.08, 1,
"7/7/2019", "7/1/2019", 2.83, 2,
"7/6/2019", "7/1/2019", 1.38, 2,
"7/5/2019", "7/1/2019", 1.59, 2,
"7/4/2019", "7/1/2019", 0.93, 2,
"7/3/2019", "7/1/2019", 1.5, 2,
"7/2/2019", "7/1/2019", 3.22, 2,
"7/1/2019", "7/1/2019", 6.2, 2,
"6/30/2019", "6/24/2019", 5.47, 3,
"6/29/2019", "6/24/2019", 1.77, 3
)
library(lubridate)
df$Date <- lubridate::mdy(df$Date)
df$Week <- lubridate::mdy(df$Week)
df$week_num <- lubridate::week(df$Date)
library(dplyr)
df %>%
group_by(week_num) %>%
summarise(spend_sum = sum(Spend),
spend_sd = sd(Spend))