I have a dataset with multiple sites and sampling years, with a score for every day of the year. For example, SiteA has 40 years of data with a value for every day, and sampling year defined as Sampling.Year. To make it confusing our sampling year is July-June so takes the form of 2016-2017.
For example:
SiteName Sampling.Year Date Score
A 2015-2016 1
A 2015-2016 5
A 2015-2016 2
A 2016-2017 3
A 2016-2017 12
A 2016-2017 6
B 2015-2016 9
B 2015-2016 2
B 2015-2016 1
B 2016-2017 4
B 2016-2017 1
B 2016-2017 7
I want to apply a rolling 182-day average across this data to find the maximum (182-day average) score for each site/Sampling.Year combination. The outcome would be, e.g.:
Site Sampling.Year MaxAve StartDate
A 2016-2017 7.5 01/10/2016
A 2017-2018 6.0 12/12/2017
B 2016-2017 2.3 13/11/2016
B 2017-2018 4.2 09/09/2017
I have saved a sample dataset here:
Sample data.
I want to use a loop code (because I am a novice and i'm not sure of a better way) along the lines of this, but it's the grouping of sites and years that I'm finding tricky. I would ideally like to have the moving average able to be exported as a new dataframe with start and end date (or at least start date) for each window so we can check it against weather conditions at the time.
Moving_Average_Function <- function(arr, n=182){
res = arr
for(i in n:length(arr)){
res[i] = mean(arr[(i-n+1):i])
}
res
}
Thanks in advance
If you are willing to use external libraries, you could use group_by() from the dplyr package, and the roll_mean() function from the RcppRoll package. RcppRoll has a set of fast flexible functions for calculating moving averages.
I would also tend to convert your DATE column to a date class so it arranges smoothly.
library(dplyr) # I would typically use library(tidyverse) to load both dplyr and tidyr (among other related packages)
library(tidyr)
library(lubridate)
library(RcppRoll)
my_data <- data.table::fread("DailyScore.csv") # easy way to load a data frame from file
my_data2 <- my_data %>%
mutate(DATE = dmy(DATE)) %>% # Converting to Date format
pivot_longer(H1:T2,
names_to = "Sensor",
values_to = "data"
) %>% # convert column names to data
group_by(STATION, Sensor) %>% # so you don't average by site.
arrange(STATION, DATE) %>% # to be sure you are in order for the rolling mean
# The STATION argument isn't necessary, but helps for display
mutate(Mean_182 = roll_meanr(data, 182)) %>% # New column with your rolling mean
pivot_wider(names_from = Sensor, values_from = c(data, Mean_182)) # converts back to original "wide" format
my_data2[180:195,]
# # A tibble: 16 x 14
# # Groups: STATION [1]
# STATION SITENAME Sampling.Year DATE data_H1 data_I1 data_H2 data_P2 data_T2 Mean_182_H1
# <chr> <chr> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 Site A Site A 1979-1980 1980-06-28 2.85 1.06e- 9 2.10 0.762 2.85 NA
# 2 Site A Site A 1979-1980 1980-06-29 2.79 1.62e-12 2.06 0.744 2.79 NA
# 3 Site A Site A 1979-1980 1980-06-30 2.75 1.00e-11 2.04 0.732 2.75 2.70
# 4 Site A Site A 1980-1981 1980-07-01 2.72 1.00e-11 2.01 0.724 2.72 2.71
# 5 Site A Site A 1980-1981 1980-07-02 2.70 1.00e-11 2.00 0.720 2.70 2.73
# 6 Site A Site A 1980-1981 1980-07-03 2.68 1.00e-11 1.98 0.718 2.68 2.74
# 7 Site A Site A 1980-1981 1980-07-04 2.67 1.00e-11 1.97 0.719 2.67 2.75
# 8 Site A Site A 1980-1981 1980-07-05 2.65 1.11e- 9 1.95 0.708 2.65 2.76
# 9 Site A Site A 1980-1981 1980-07-06 2.62 2.77e-10 1.93 0.703 2.62 2.76
# 10 Site A Site A 1980-1981 1980-07-07 2.60 3.18e-12 1.92 0.700 2.60 2.77
# 11 Site A Site A 1980-1981 1980-07-08 2.59 1.00e-11 1.90 0.701 2.59 2.79
# 12 Site A Site A 1980-1981 1980-07-09 2.59 1.00e-11 1.89 0.706 2.59 2.80
# 13 Site A Site A 1980-1981 1980-07-10 2.59 1.00e-11 1.89 0.713 2.59 2.81
# 14 Site A Site A 1980-1981 1980-07-11 2.59 1.00e-11 1.88 0.722 2.59 2.82
# 15 Site A Site A 1980-1981 1980-07-12 2.60 1.00e-11 1.88 0.731 2.60 2.83
# 16 Site A Site A 1980-1981 1980-07-13 2.60 1.00e-11 1.87 0.741 2.60 2.84
# # ... with 4 more variables: Mean_182_I1 <dbl>, Mean_182_H2 <dbl>, Mean_182_P2 <dbl>, Mean_182_T2 <dbl>
Couple things to keep in mind, which will affect how you set this up.
In general rolling averages will return NA when they don't have a complete data set. So with a 182-day average, you'll get a series of 181 NA's before your first complete average.
You'll want to figure out how you want to handle the roll over parts- especially with a long period rolling average, if you don't want to mix sampling years, you'll have about half the year without data.
Using loops would be very inefficient for such operations. You can use some dedicated functions which allow you to perform this by group and use zoo::rollmean to get rolling mean.
library(dplyr)
DailyScore %>%
group_by(SITENAME, Sampling.Year) %>%
summarise(max_average = max(zoo::rollmean(Score, 182)))
Related
Suppose I am using panel data: for each individual and time, there is an observation of a numerical variable. I want to apply a function to this numerical variable but this function outputs a vector of numbers. I'd like to apply this function over the observations of each individual and store the resulting vector as columns of a new dataframe.
Example:
TICKER OFTIC CNAME ANNDATS_ACT ACTUAL
<chr> <chr> <chr> <date> <dbl>
1 0001 EPE EP ENGR CORP 2019-05-08 -0.15
2 0004 ACSF AMERICAN CAPITAL 2014-08-04 0.29
3 000R CRCM CARECOM 2018-02-27 0.32
4 000V EIGR EIGER 2018-05-11 -0.84
5 000Y RARE ULTRAGENYX 2016-02-25 -1.42
6 000Z BIOC BIOCEPT 2018-03-28 -54
7 0018 EGLT EGALET 2016-03-08 -0.28
8 001A SESN SESEN BIO 2021-03-15 -0.11
9 001C ARGS ARGOS 2017-03-16 -7
10 001J KN KNOWLES 2021-02-04 0.38
For each TICKER, I will consider the time-series implied by ACTUAL and compute the autocorrelation function. I defined the following wrapper to perform the operation:
my_acf <- function(x, lag = NULL){
acf_vec <- acf(x, lag.max = lag, plot = FALSE, na.action = na.contiguous)$acf
acf_vec <- as.vector(acf_vec)[-1]
return(acf_vec)
}
If the desired maximum lag is, say, 3, I'd like to create another dataset in which I have 4 columns: TICKER and the correspoding 3 first autocorrelations of the associated series of ACTUAL observations.
My solution was:
max_lag = 3
autocorrs <- final_sample %>%
group_by(TICKER) %>%
filter(!all(is.na(ACTUAL))) %>%
summarise(rho = my_acf(ACTUAL, lag = max_lag)) %>%
mutate(order = row_number()) %>%
pivot_wider(id_cols = TICKER, values_from = rho, names_from = order, names_prefix = "rho_")
This indeed provides the desired output:
TICKER rho_1 rho_2 rho_3
<chr> <dbl> <dbl> <dbl>
1 0001 0.836 0.676 0.493
2 0004 0.469 -0.224 -0.366
3 000R 0.561 0.579 0.327
4 000V 0.634 0.626 0.604
5 000Y 0.370 0.396 0.117
6 000Z 0.476 0.454 0.382
7 0018 0.382 -0.0170 -0.278
8 001A 0.330 0.316 0.0944
9 001C 0.727 0.590 0.400
10 001J 0.281 -0.308 -0.0343
My question is how can one perform this operation without a pivot_wider and the manual creation of the order column? The summarise verb creates a single column that store the autocorrelations sequentially for each TICKER. Is there a way to force summarize to create different columns for the different output a given function may provide when applied to, let's say, the ACTUAL series?
I have a dataset of ingredients for cookies. I'm trying to answer which group (A, B, C, etc) of cookies has the most sugar in them. The dataset is structured as follows:
group id mois prot fat hocolate sugar carb cal
1 A 14069 27.82 21.43 44.87 5.11 1.77 0.77 4.93
2 A 14053 28.49 21.26 43.89 5.34 1.79 1.02 4.84
3 A 14025 28.35 19.99 45.78 5.08 1.63 0.80 4.95
4 B 14016 30.55 20.15 43.13 4.79 1.61 1.38 4.74
5 B 14005 30.49 21.28 41.65 4.82 1.64 1.76 4.67
6 A 14075 31.14 20.23 42.31 4.92 1.65 1.40 4.67
7 C 14082 31.21 20.97 41.34 4.71 1.58 1.77 4.63
8 C 14097 28.76 21.41 41.60 5.28 1.75 2.95 4.72
etc....
How can I plot the mean of each grouping to show that one of them has a higher average of sugar than the others? Or at the least, how can I print off the results of the grouped averages of sugar to defend my argument that one has more sugar than the other?
After saving your text to CSV and loading this file into R, it's pretty easy to obtain the mean sugar quantity per group, which I'm assuming is what you need.
You first group your data by variable group and then summarize the data using the "mean" function.
library(dplyr)
(cookies = df %>%
group_by(group) %>%
summarize(meanSugar = mean(sugar)))
group meanSugar
<chr> <dbl>
1 A 1.71
2 B 1.62
3 C 1.66
As you can see, group A has sugar content a bit higher than the others based on your data.
If you wanna go a step further and really plot this data, you can do that:
library(ggplot2)
cookies %>%
ggplot(aes(x=meanSugar,y=reorder(group,meanSugar),fill=group,label=meanSugar)) +
geom_col()+
labs(y="Cookie groups",x="Mean Sugar")+
geom_label(stat="identity",hjust=+1.2,color="white")+
theme(legend.position = "none")
If you have any questions on some of these steps, let me know!
Obs: please try to provide better data the next time so it's easy to reproduce what you need and give you a quick answer :)
I have a large dataset that has a continuous variable "Cholesterol" for two visits for each participant (each participant has two rows: first visit = Before & second visit= After). I'd like to standadise cholesterol but I have both Before and After visits merged which will not make my standardisation accurate as it is calculated using the mean and the SD
USING R BASE, How can I create a new cholesterol variable standardised based on Visit in the same data set (in this process standardisation should be done twice; once for Before and another time for After, but the output (standardised values) will be in a one variable again following the same structure of this DF
DF$Cholesterol<- c( 0.9861551,2.9154158, 3.9302373,2.9453085, 4.2248018,2.4789901, 0.9972635, 0.3879830, 1.1782336, 1.4065341, 1.0495609,1.2750138, 2.8515144, 0.4369885, 2.2410429, 0.7566147, 3.0395565,1.7335131, 1.9242212, 2.4539439, 2.8528908, 0.8432039,1.7002653, 2.3952744,2.6522959, 1.2178764, 2.3426695, 1.9030782,1.1708246,2.7267124)
DF$Visit< -c(Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before,After,Before, After,Before,After,Before,After)
# the standardisation function I want to apply
standardise <- function(x) {return((x-min(x,na.rm = T))/sd(x,na.rm = T))}
thank you in advance
Let's make your data, fix the df$visit assignment, fix the standardise function to be mean rather than min, and then assume each new occasion of before is the next person, pivot to wide format, then mutate our before and after standardised variables:
df <- data.frame(x = rep(1, 30))
df$cholesterol<- c( 0.9861551,2.9154158, 3.9302373,2.9453085, 4.2248018,2.4789901, 0.9972635, 0.3879830, 1.1782336, 1.4065341, 1.0495609,1.2750138, 2.8515144, 0.4369885, 2.2410429, 0.7566147, 3.0395565,1.7335131, 1.9242212, 2.4539439, 2.8528908, 0.8432039,1.7002653, 2.3952744,2.6522959, 1.2178764, 2.3426695, 1.9030782,1.1708246,2.7267124)
df$visit <- rep(c("before", "after"), 15)
standardise <- function(x) {return((x-mean(x,na.rm = T))/sd(x,na.rm = T))}
df <- df %>%
mutate(person = cumsum(visit == "before"))%>%
pivot_wider(names_from = visit, id_cols = person, values_from = cholesterol)%>%
mutate(before_std = standardise(before),
after_std = standardise(after))
gives:
person before after before_std after_std
<int> <dbl> <dbl> <dbl> <dbl>
1 1 0.986 2.92 -1.16 1.33
2 2 3.93 2.95 1.63 1.36
3 3 4.22 2.48 1.91 0.842
4 4 0.997 0.388 -1.15 -1.49
5 5 1.18 1.41 -0.979 -0.356
6 6 1.05 1.28 -1.10 -0.503
7 7 2.85 0.437 0.609 -1.44
8 8 2.24 0.757 0.0300 -1.08
9 9 3.04 1.73 0.788 0.00940
10 10 1.92 2.45 -0.271 0.814
11 11 2.85 0.843 0.611 -0.985
12 12 1.70 2.40 -0.483 0.749
13 13 2.65 1.22 0.420 -0.567
14 14 2.34 1.90 0.126 0.199
15 15 1.17 2.73 -0.986 1.12
If you actually want min in your standardise function rather than mean, editing it should be simple enough.
Edited for BaseR solution, but with a cautionary tale that there's probably a much neater solution:
df <- data.frame(id = rep(c(seq(1, 15, 1)), each = 2))
df$cholesterol<- c( 0.9861551,2.9154158, 3.9302373,2.9453085, 4.2248018,2.4789901, 0.9972635, 0.3879830, 1.1782336, 1.4065341, 1.0495609,1.2750138, 2.8515144, 0.4369885, 2.2410429, 0.7566147, 3.0395565,1.7335131, 1.9242212, 2.4539439, 2.8528908, 0.8432039,1.7002653, 2.3952744,2.6522959, 1.2178764, 2.3426695, 1.9030782,1.1708246,2.7267124)
df$visit <- rep(c("before", "after"), 15)
df <- reshape(df, direction = "wide", idvar = "id", timevar = "visit")
standardise <- function(x) {return((x-mean(x,na.rm = T))/sd(x,na.rm = T))}
df$before_std <- round(standardise(df$cholesterol.before), 2)
df$aafter_std <- round(standardise(df$cholesterol.after), 2)
gives:
i id cholesterol.before cholesterol.after before_std after_std
1 1 0.9861551 2.9154158 -1.16 1.33
3 2 3.9302373 2.9453085 1.63 1.36
5 3 4.2248018 2.4789901 1.91 0.84
7 4 0.9972635 0.3879830 -1.15 -1.49
9 5 1.1782336 1.4065341 -0.98 -0.36
11 6 1.0495609 1.2750138 -1.10 -0.50
13 7 2.8515144 0.4369885 0.61 -1.44
15 8 2.2410429 0.7566147 0.03 -1.08
17 9 3.0395565 1.7335131 0.79 0.01
19 10 1.9242212 2.4539439 -0.27 0.81
21 11 2.8528908 0.8432039 0.61 -0.99
23 12 1.7002653 2.3952744 -0.48 0.75
25 13 2.6522959 1.2178764 0.42 -0.57
27 14 2.3426695 1.9030782 0.13 0.20
29 15 1.1708246 2.7267124 -0.99 1.12
I need to prepare a table that includes the means and standards deviations for each level of several demographic variables and for many variables.
Consider the following data:
df <- tibble(place=c("London","Paris","London","Rome","Rome","Madrid","Madrid"),gender=c("m","f","f","f","m","m","f"), education = c(1,1,2,3,5,5,3), var1 = c(2.2,3.1,4.5,1,5,1.4,2.3),var2 = c(4.2,2.1,2.5,4,5,4.4,1.3),var3 = c(0.2,0.1,3.5,3,5,2.4,4.3))
I would like to get a dataframe that contains the grouping variables (place, gender, education) and their levels (e.g., London, Paris, etc.) in the first column and their means and standard deviations for each variable starting with var (var1, var2, var3) in additional columns.
I know how to do this for one group and several variables at a time. However, since I need to repeat this dozens of times I am looking for a way to automate this process. It would be great to have a function to which I simply need to pass (a) the names of the grouping variables (e.g., gender, education) and (b) the variables from which to get the M / SD (e.g. var1, var2).
The solution I look for should look like this (the stats are not correct in the example below):
my_results <- tibble(grouping_vars = c("place_London","place_Paris","place_Rome","place_Madrid","gender_m","gender_f","last_element"),mean_var1=c(1.3,2.5,4.5,1.7,2.5,3.6,4.0),sd_var1=c(0.01,0.41,0.21,0.12,0.02,0.38,0.28),mean_var2=c(4.3,4.5,4.0,1.2,2.5,1.6,2.3),sd_var2=c(0.21,0.1,0.1,0.32,0.22,0.18,0.08),mean_var3=c(2.3,2.5,2.0,3.2,3.5,0.6,5),sd_var3=c(0.51,0.15,0.51,0.52,0.52,0.15,0.48))
grouping_vars mean_var1 sd_var1 mean_var2 sd_var2 mean_var3 sd_var3
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 place_London 1.3 0.01 4.3 0.21 2.3 0.51
2 place_Paris 2.5 0.41 4.5 0.1 2.5 0.15
3 place_Rome 4.5 0.21 4 0.1 2 0.51
4 place_Madrid 1.7 0.12 1.2 0.32 3.2 0.52
5 gender_m 2.5 0.02 2.5 0.22 3.5 0.52
6 gender_f 3.6 0.38 1.6 0.18 0.6 0.15
7 last_element 4 0.28 2.3 0.08 5 0.48
Since I typically work with tidyverse, I would particularly appreciate solutions that use these packages (probably dplyr or purrr?).
EDIT:
I thought there would be an elegant way to do this using map(). Maybe there is but I haven't found it yet. For the mean time, I figured out a way that simply restructures the data into an appropriate long format and then computes the statistics.
df %>%
# all grouping vars need to be of the same type, here "factor" is most appropriate
mutate_at(grouping_vars, list(factor)) %>%
# pivot longer, so that each row is a unique combination of grouping variable and grouping level
pivot_longer(
cols = one_of(grouping_vars),
names_to = "group_var",
values_to = "group_level"
) %>%
# merge grouping variable and group level into a single column
unite(var_level,group_var,group_level, sep="_") %>%
# group by group level
group_by(var_level) %>%
# compute means and sd for each test variable
summarise_at(test_vars, list(~mean(., na.rm = TRUE), ~sd(., na.rm = TRUE)))
The result seems fine, e.g., the mean of var1 of the two people who live in London (2.2 + 4.5) is 3.35.
# A tibble: 10 x 7
var_level var1_mean var2_mean var3_mean var1_sd var2_sd var3_sd
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 education_1 2.65 3.15 0.15 0.636 1.48 0.0707
2 education_2 4.5 2.5 3.5 NA NA NA
3 education_3 1.65 2.65 3.65 0.919 1.91 0.919
4 education_5 3.2 4.7 3.7 2.55 0.424 1.84
5 gender_f 2.72 2.48 2.72 1.47 1.13 1.83
6 gender_m 2.87 4.53 2.53 1.89 0.416 2.40
7 place_London 3.35 3.35 1.85 1.63 1.20 2.33
8 place_Madrid 1.85 2.85 3.35 0.636 2.19 1.34
9 place_Paris 3.1 2.1 0.1 NA NA NA
10 place_Rome 3 4.5 4 2.83 0.707 1.41
Any thoughts on possible risks of this approach or how this could be improved?
One option is the describeBy function from psych:
library(psych)
describeBy(df,group = c("gender","education"), mat= TRUE)
Then subset what you want from there.
Another, surprisingly simple option with dplyr:
library(dplyr)
group.vars <- c("gender","education")
measure.vars <- c("var1","var2")
df %>%
group_by_at(group.vars) %>%
summarize_at(measure.vars,
list(mean =~ mean(.),sd =~ sd(.)))
# A tibble: 5 x 6
# Groups: gender [2]
gender education var1_mean var2_mean var1_sd var2_sd
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 f 1 3.1 2.1 NA NA
2 f 2 4.5 2.5 NA NA
3 f 3 1.65 2.65 0.919 1.91
4 m 1 2.2 4.2 NA NA
5 m 5 3.2 4.7 2.55 0.424
You can continue adding additional function to that list. For every element, the name will be appended to the variable and the result will be come the column values. Recall that ~ is shorthand for function(x).
I'd like to store functions, or at least their names, in a column of a data.frame for use in a call to mutate. A simplified broken example:
library(dplyr)
expand.grid(mu = 1:5, sd = c(1, 10), stat = c('mean', 'sd')) %>%
group_by(mu, sd, stat) %>%
mutate(sample = get(stat)(rnorm(100, mu, sd))) %>%
ungroup()
If this worked how I thought it would, the value of sample would be generated by the function in .GlobalEnv corresponding to either 'mean' or 'sd', depending on the row.
The error I get is:
Error in mutate_impl(.data, dots) :
Evaluation error: invalid first argument.
Surely this has to do with non-standard evaluation ... grrr.
A few issues here. First expand.grid will convert character values to factors. And get() doesn't like working with factors (ie get(factor("mean")) will give an error). The tidyverse-friendly version is tidyr::crossing(). (You could also pass stringsAsFactors=FALSE to expand.grid.)
Secondly, mutate() assumes that all functions you call are vectorized, but functions like get() are not vectorized, they need to be called one-at-a-time. A safer way rather than doing the group_by here to guarantee one-at-a-time evaluation is to use rowwise().
And finally, your real problem is that you are trying to call get("sd") but when you do, sd also happens to be a column in your data.frame that is part of the mutate. Thus get() will find this sd first, and this sd is just a number, not a function. You'll need to tell get() to pull from the global environment explicitly. Try
tidyr::crossing(mu = 1:5, sd = c(1, 10), stat = c('mean', 'sd')) %>%
rowwise() %>%
mutate(sample = get(stat, envir = globalenv())(rnorm(100, mu, sd)))
Three problems (that I see): (1) expand.grid is giving you factors; (2) get finds variables, so using "sd" as a stat is being confused with the column names "sd" (that was hard to find!); and (3) this really is a row-wise operation, grouping isn't helping enough. The first is easily fixed with an option, the second can be fixed by using match.fun instead of get, and the third can be mitigated with dplyr::rowwise, purrr::pmap, or base R's mapply.
This helper function was useful during debugging and can be used to "clean up" the code within mutate, but it isn't required (for other than this demonstration). Inline "anonymous" functions will work as well.
func <- function(f,m,s) get(f)(rnorm(100,mean=m,sd=s))
Several implementation methods:
set.seed(0)
expand.grid(mu = 1:5, sd = c(1, 10), stat = c('mean', 'sd'),
stringsAsFactors=FALSE) %>%
group_by(mu, sd, stat) %>% # can also be `rowwise() %>%`
mutate(
sample0 = match.fun(stat)(rnorm(100, mu, sd)),
sample1 = purrr::pmap_dbl(list(stat, mu, sd), ~ match.fun(..1)(rnorm(100, ..2, ..3))),
sample2 = purrr::pmap_dbl(list(stat, mu, sd), func),
sample3 = mapply(function(f,m,s) match.fun(f)(rnorm(100,m,s)), stat, mu, sd),
sample4 = mapply(func, stat, mu, sd)
) %>%
ungroup()
# # A tibble: 20 x 8
# mu sd stat sample0 sample1 sample2 sample3 sample4
# <int> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 mean 1.02 1.03 0.896 1.08 0.855
# 2 2 1 mean 1.95 2.07 2.05 1.90 1.92
# 3 3 1 mean 2.93 3.07 3.03 2.89 3.01
# 4 4 1 mean 4.01 3.94 4.23 4.05 3.96
# 5 5 1 mean 5.04 5.11 5.05 5.17 5.19
# 6 1 10 mean 1.67 1.21 1.30 2.08 -0.641
# 7 2 10 mean 1.82 2.82 2.35 3.65 1.78
# 8 3 10 mean 1.45 3.10 3.15 4.28 2.58
# 9 4 10 mean 3.49 6.33 5.11 2.84 3.41
# 10 5 10 mean 5.33 4.85 4.07 5.58 6.66
# 11 1 1 sd 0.965 1.04 0.993 0.942 1.08
# 12 2 1 sd 0.974 0.967 0.981 0.984 1.15
# 13 3 1 sd 1.12 0.902 1.06 0.977 1.02
# 14 4 1 sd 0.946 0.928 0.960 1.01 0.992
# 15 5 1 sd 1.06 1.01 0.911 1.11 1.00
# 16 1 10 sd 9.46 8.95 10.0 8.91 9.60
# 17 2 10 sd 9.51 9.11 11.5 9.85 10.6
# 18 3 10 sd 9.77 9.96 11.0 9.09 10.7
# 19 4 10 sd 10.5 9.84 10.1 10.6 8.89
# 20 5 10 sd 11.2 8.82 10.4 9.06 9.64
sample0 happens to work because you have grouped it to be row-wise. If at some point any one grouping has two or more values, this will fail.
For sample1 through sample4, you can remove the group_by and it works equally well (though sample0 demonstrates its failing, so remove it too). You won't get identical results as above with grouping removed, because the entropy is being consumed differently.