Use select_helpers with dplyr::coalesce - r

I have a very wide dataframe (much larger than the data provided here for reprex).
Using the data provided below (assigned to my_wide_data), I would like to utilize dplyr::coalesce along with the select helpers from dplyr (e.g. dplyr::starts_with).
# dput output assigned to my_wide_data
structure(list(myvar1 = c(10L, 3L, 11L, 2L, 4L, 5L, 2L, 6L, 1L,
4L, 12L, 9L, 12L, 2L, 3L, 1L, 2L, 8L, 1L, 2L, 3L, 3L, 8L, 11L,
10L, 6L, 3L, 10L, 5L, 2L, 8L, 3L, 1L, 6L, 2L, 1L, 8L, 4L, 10L,
3L, 1L, 4L, 2L, 12L, 3L, 2L, 5L, 1L, 3L, 5L, 3L, 2L, 12L, 3L,
6L, 11L, 12L, 2L, 6L, 10L, 3L, 10L, 3L, 2L, 2L, 2L, 2L, 3L, 6L,
3L, 6L, 10L, 1L, 3L, 3L, 6L, 2L, 3L, 3L, 3L, 2L, 3L, 2L, 10L,
3L, 3L, 4L, 1L, 3L, 2L, 3L, 9L, 1L, 1L, NA, 5L, 1L, 8L, 3L, 10L,
3L, 3L, 4L, 7L, 10L, 2L, 2L, 11L, 6L, 11L, 6L, 4L, 4L, 12L, 6L,
6L, 1L, 2L, 11L, 2L, 2L, 11L, 3L, 2L, 3L, 2L, 2L, 3L, 3L, 9L,
2L, 1L, 1L, 4L, 2L, 8L, 2L, 10L, 6L, 3L, 1L, 6L, 2L, 10L, 3L,
5L, 6L, 3L, 4L, 10L, 9L, 3L, 4L, 3L, 2L, 3L, 9L, 3L, 3L, 1L,
10L, 4L, 4L, 6L, 2L, 7L, 3L, 2L, 3L, 1L, 3L, 3L, 3L, 7L, 2L,
2L, 6L, 2L, 4L, 3L, 3L, 4L, 2L, 4L, 2L, 5L, 5L, 3L, 6L, 5L, 4L,
5L, 4L, 4L, 10L, 1L, 9L, 4L, 4L, 4L, 4L, 8L, 6L, 5L), myvar2 = c(24L,
24L, 27L, 8L, 9L, 15L, 1L, 27L, 3L, 23L, 28L, 10L, 24L, 5L, 14L,
17L, 16L, 28L, 29L, 16L, 3L, 13L, 7L, 13L, 18L, 25L, 10L, 10L,
15L, 27L, 21L, 17L, 25L, 25L, 15L, 25L, 21L, 13L, 9L, 28L, 1L,
13L, 19L, 21L, 23L, 15L, NA, 29L, 12L, 25L, 1L, 5L, 12L, 7L,
15L, 25L, 4L, 8L, 30L, 25L, 8L, NA, 6L, 16L, 14L, 7L, 20L, 26L,
19L, 10L, 1L, 15L, 30L, 7L, 16L, 23L, 24L, 21L, 8L, 1L, 1L, 10L,
26L, 28L, 5L, 7L, 21L, 10L, 13L, 26L, 14L, 5L, 22L, 18L, NA,
NA, 9L, 20L, 17L, 23L, 3L, 13L, 7L, 5L, 6L, 9L, 8L, 15L, 9L,
10L, 15L, 13L, NA, 30L, 22L, 14L, 9L, 16L, 6L, 13L, 19L, 15L,
1L, 7L, 19L, 25L, 10L, NA, 8L, 25L, 5L, 2L, 16L, 8L, 19L, 18L,
27L, 2L, NA, 16L, 29L, 4L, 7L, 27L, 24L, 5L, 6L, 17L, 16L, 13L,
11L, NA, 12L, 9L, 8L, 1L, NA, 5L, 12L, 3L, 3L, 10L, 16L, 16L,
5L, 24L, 10L, 17L, 23L, 19L, 12L, 12L, 18L, 6L, 1L, 3L, 15L,
26L, 28L, 28L, 27L, 3L, 18L, 22L, 13L, 11L, 30L, 24L, 1L, 25L,
21L, 7L, 14L, 16L, 9L, 3L, 28L, 11L, 17L, 11L, 25L, 23L, 7L,
21L), myvar3 = c(78L, 79L, 78L, 78L, 79L, 78L, 79L, 77L, 79L,
79L, 76L, 78L, 78L, 79L, 79L, 79L, 79L, 78L, 79L, 79L, 79L, 79L,
78L, 78L, 78L, 79L, 79L, 78L, 78L, 79L, 78L, 79L, 79L, 78L, 79L,
79L, 78L, 78L, 78L, 79L, 79L, 79L, 79L, 78L, 79L, 79L, 73L, 79L,
79L, 79L, 79L, 79L, 72L, 79L, 78L, 78L, 78L, 79L, 78L, 78L, 79L,
78L, 79L, 79L, 79L, 79L, 79L, 78L, 78L, 79L, 78L, 78L, 79L, 79L,
79L, 76L, 79L, 78L, 79L, 79L, 79L, 79L, 79L, 75L, 79L, 79L, 79L,
79L, 79L, 79L, 79L, 78L, 79L, 79L, 77L, 78L, 79L, 78L, 79L, 78L,
79L, 79L, 79L, 78L, 78L, 79L, 79L, 78L, 78L, 78L, 78L, 79L, 79L,
78L, 78L, 76L, 79L, 76L, 77L, 79L, 79L, 78L, 79L, 79L, 79L, 79L,
79L, 79L, 79L, 78L, 78L, 79L, 78L, 79L, 79L, 78L, 79L, 78L, 79L,
79L, 79L, 79L, 79L, 78L, 79L, 79L, 77L, 79L, 79L, 78L, 78L, 79L,
78L, 79L, 79L, 79L, 78L, 79L, 79L, 79L, 78L, 79L, 79L, 78L, 79L,
78L, 79L, 79L, 78L, 79L, 79L, 79L, 79L, 79L, 79L, 79L, 78L, 79L,
78L, 79L, 79L, 79L, 79L, 79L, 78L, 79L, 79L, 79L, 79L, 79L, 79L,
79L, 78L, 79L, 78L, 79L, 78L, 79L, 79L, 79L, 79L, 76L, 78L, 79L
)), class = "data.frame", row.names = c(NA, -204L)) -> my_wide_data
In other words, instead of
my_wide_data %>%
mutate(coalesce_var <- coalesce(myvar1, myvar2, myvar3))
I would like to be able to do something like
my_wide_data %>%
mutate(coalesce_var <- coalesce(starts_with("my")))
QUESTION: Is it possible to accomplish something like this within dplyr or elsewhere in the tidyverse?

The following works by taking advantage that coalesce(...) can accept a list
vecs <- list(
c(1, 2, NA, NA, 5),
c(NA, NA, 3, 4, 5)
)
coalesce(!!! vecs)
Which you can combine with using a helper function in select and turning the resulting selected data frame into a list
my_wide_data %>%
mutate(coalesce_var = coalesce(!!! select(., starts_with("my"))))
# myvar1 myvar2 myvar3 coalesce_var
# 1 10 24 78 10
# 2 3 24 79 3
# 3 11 27 78 11
# 4 2 8 78 2
# 5 4 9 79 4
# etc
EDIT Here's an alternative construction - which I prefer
library(rlang)
library(tidyselect)
my_wide_data %>%
mutate(coalesce_var = coalesce(!!! syms(vars_select(names(.), starts_with("my")))))

Related

R loess regression

I think I missed something in the use of the loess function and I can't understand what i did wrong. I have a data frame in which I store the output (count) of 3 different softwares for 26 different genes on the genomes of different patients. The 3 softwares were each used on the same genome but with different rate of downsampling.
I pooled the results of all the patients by genes. At the end I have a data frame with 4 columns: samplexxx (downsampling rate), software (name of the software I used), gene (the name of the gene) and count (count results given by the software).
My goal is to estimate the downsampling effect (samplexxx) on the count given by the software, and I want to do some regression to be able to compare them with each other.
rate <- c(5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90,
95, 100)
my attempts:
datalist <- list()
for (i in 1:22) {
name <- genes[i]
print(name)
mod <- paste("mod_", name)
xfit <- paste("xfit_", name)
df <- paste("df_", name)
mod <- loess(data2[data2$gene == name,]$count ~
data2[data2$gene == name,]$samplexxx)
xfit <- predict(mod, newdata=data2[data2$gene == name,]$samplexxx)
df <- setNames(data.frame(matrix(ncol=4, nrow=60)),
c("down", "software", "gene", "loess"))
df$down <- data2[data2$gene == name,]$samplexxx
df$software <- data2[data2$gene == name,]$software
df$gene <- data2[data2$gene == name,]$gene
df$loess <- xfit
print(xfit)
datalist[[i]] <- df
}
data_loess <- do.call(rbind, datalist)
ggplot(data_loess, aes(x=gene, y=loess, fill=software)) +
geom_boxplot()
and:
mod <- loess(data2$count ~ data$samplexxx)
xfit <- predict(mod, newdata=data2$samplexxx)
for (i in 1:20) {
down <- rate[i]
print(name)
title <- paste("loess_downsampling", down)
out <- paste("loess_downsampling", down, ".pdf", sep="")
pdf(out, width=10)
print(ggplot(data2, aes(x=down, y=loess, fill=software))) +
geom_boxplot() + ggtitle(title))
dev.off()
}
Sample data:
> dput(data2)
structure(list(samplexxx = c(5L, 10L, 15L, 20L, 25L, 30L, 35L,
40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L,
5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L,
70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L,
35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L,
100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L,
65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L,
30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L,
95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L,
60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L,
25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L,
90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L,
55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L,
20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L,
85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L,
50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L,
15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L,
80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L,
45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L,
5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L,
70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L,
35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L,
100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L,
65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L,
30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L,
95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L,
60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L,
25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L,
90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L,
55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L,
20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L,
85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L,
50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L,
15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L,
80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L,
45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L,
5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L,
70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L,
35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L,
100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L,
65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L,
30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L,
95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L,
60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L,
25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L,
90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L,
55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L,
20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L,
85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L,
50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L,
15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L,
80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L,
45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L,
5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L,
70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L,
35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L,
100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L,
65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L,
30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L,
95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L,
60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L,
25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L,
90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L,
55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L,
20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L,
85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L,
50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L,
15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L,
80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L,
45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L,
5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L,
70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L,
35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L,
100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L,
65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L,
30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L,
95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L,
60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L,
25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L,
90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L,
55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L,
20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L,
85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L,
50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L,
15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L,
80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L,
45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L,
5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L,
70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L,
35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L,
100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L,
65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L,
30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L,
95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L,
60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L,
25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L,
90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L,
55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L, 15L,
20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L,
85L, 90L, 95L, 100L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L,
50L, 55L, 60L, 65L, 70L, 75L, 80L, 85L, 90L, 95L, 100L, 5L, 10L,
15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L,
80L, 85L, 90L, 95L, 100L), software = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("EH", "GangSTR", "Tred"), class = "factor"),
gene = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L,
13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L,
14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L,
14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L,
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L,
15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L,
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L,
17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L,
17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L,
18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L,
18L, 18L, 18L, 18L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L,
19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L,
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L,
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 21L, 21L, 21L, 21L,
21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L,
21L, 21L, 21L, 21L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L,
22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L,
13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L,
14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L,
14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L,
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L,
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L,
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 17L, 17L, 17L, 17L,
17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L,
17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L,
18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L,
19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L,
19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 20L, 20L, 20L, 20L,
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L,
20L, 20L, 20L, 20L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L,
21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L,
22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L,
22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L,
13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L,
13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L,
14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L,
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L,
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L,
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L,
16L, 16L, 16L, 16L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L,
17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L,
18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L,
18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 19L,
19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L,
19L, 19L, 19L, 19L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L,
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L,
21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L,
21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 22L, 22L, 22L, 22L,
22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L,
22L, 22L, 22L, 22L), .Label = c("AFF2", "AR", "ATN1", "ATXN1",
"ATXN10", "ATXN2", "ATXN3", "ATXN7", "C9ORF72", "CACNA1A",
"CBL", "CNBP", "CSTB", "DIP2B", "DMPK", "FMR1", "FXN", "HTT",
"JPH3", "NOP56", "PPP2R2B", "TBP"), class = "factor"), count = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, 24L, 24L, 24L, 24L, 24L,
24L, 24L, 24L, 24L, 24L, 24L, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L,
21L, NA, NA, NA, NA, NA, NA, NA, NA, NA, 17L, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 15L, 15L,
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L,
16L, NA, NA, NA, NA, 20L, 34L, 31L, 33L, 34L, 34L, 34L, 34L,
34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, NA, NA, NA, NA, NA,
22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L,
22L, 22L, 22L, NA, NA, NA, NA, NA, 22L, 24L, 24L, 24L, 24L,
24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, NA, NA,
NA, NA, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, NA, NA, NA, NA, 6L, 8L, 8L,
8L, 8L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, NA, NA,
NA, NA, 11L, NA, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, NA, NA, NA, 12L, 5L, NA, 12L,
12L, 5L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
12L, NA, NA, NA, NA, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L,
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 20L, 20L, 18L, 20L, 20L, 20L, 20L, 20L,
20L, 20L, 20L, 20L, 20L, 20L, 20L, NA, NA, NA, NA, 27L, 24L,
21L, 14L, 27L, 14L, 21L, 27L, 27L, 14L, 27L, 27L, 27L, 27L,
27L, 27L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 68L, 73L,
78L, 54L, 79L, 76L, 87L, 72L, 62L, 63L, NA, NA, NA, NA, NA,
27L, 27L, 27L, 28L, 27L, 27L, 64L, 27L, 64L, 64L, 27L, 27L,
27L, 27L, 27L, NA, NA, NA, NA, NA, 18L, 20L, 18L, 20L, 20L,
18L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, NA, NA,
NA, NA, NA, 15L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 9L, 7L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, NA, NA, NA, NA, NA, 14L,
14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L,
14L, 14L, NA, NA, NA, NA, NA, 35L, 29L, 35L, 35L, 30L, 35L,
32L, 35L, 35L, 35L, 35L, 35L, 35L, 35L, 35L, 11L, 19L, 19L,
19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L,
19L, 19L, 19L, 19L, 19L, 20L, 11L, 20L, 20L, 20L, 20L, 20L,
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L,
20L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 16L, 16L, 16L, 16L, 16L, 16L,
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L,
16L, 16L, 33L, 33L, 32L, 33L, 33L, 33L, 33L, 33L, 33L, 33L,
33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L, NA, 21L,
22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L,
22L, 22L, 22L, 22L, 22L, 22L, 19L, 21L, 21L, 21L, 21L, 21L,
21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L,
21L, 19L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 8L, 8L,
7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 11L, NA, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 7L, 15L, 15L, 13L, 15L, 15L, 15L, 15L, 15L, 15L,
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 27L, 19L, 27L, 27L, 27L,
27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L,
27L, 27L, NA, 76L, 23L, 23L, 23L, 32L, 65L, 32L, 28L, 32L,
28L, 32L, 32L, 23L, 28L, 32L, 28L, 28L, 32L, 84L, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 14L, 18L, 17L, 17L, 17L, 17L, 17L, 17L, 17L,
17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 15L,
NA, NA, 15L, NA, 15L, NA, NA, 15L, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 9L, NA, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 14L, 14L, 14L, 14L,
14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L,
14L, 14L, 14L, 14L, NA, 28L, 36L, 36L, NA, 36L, 36L, 36L,
36L, NA, 36L, NA, 36L, 36L, 36L, 36L, 36L, NA, 36L, 36L,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
1L, 8L, 18L, 16L, 15L, 14L, 15L, 16L, 15L, 16L, 14L, 15L,
14L, 14L, 14L, 14L, 16L, 16L, 16L, 16L, 31L, 28L, 31L, 31L,
32L, 32L, 32L, 33L, 31L, 33L, 32L, 31L, 32L, 32L, 32L, 32L,
32L, 32L, 32L, 32L, 7L, 18L, 22L, 22L, 22L, 22L, 22L, 22L,
22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L,
19L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L,
21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 5L, 6L, 6L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 12L, 11L, 12L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 5L, 7L, 7L, 7L, 7L, 11L, 11L, 7L,
11L, 15L, 15L, 11L, 7L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
1L, 2L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 4L, 20L, 17L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 1L, 2L, 1L, 1L,
1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 1L, 15L, 6L, 22L, 13L, 14L, 13L, 14L, 13L, 14L, 14L,
27L, 27L, 14L, 14L, 27L, 14L, 27L, 14L, 27L, NA, 15L, 20L,
20L, 20L, 20L, 40L, 20L, 40L, 20L, 40L, 40L, 40L, 40L, 20L,
40L, 40L, 40L, 40L, 32L, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 15L, 14L,
17L, 17L, 17L, 19L, 17L, 13L, 17L, 17L, 17L, 17L, 17L, 17L,
17L, 17L, 17L, 17L, 17L, 17L, 5L, 3L, 1L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 5L, 3L,
1L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 12L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L,
14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, NA,
2L, 3L, 2L, 29L, 33L, 33L, 35L, 33L, 35L, 35L, 33L, 35L,
35L, 33L, 35L, 35L, 35L, 35L, 35L)), class = "data.frame", row.names = c(NA,
-1320L))
I believe the loess should be done on a split on the "software".
software <- unique(data2$software)
data_loess <- do.call(rbind, lapply(software, \(x) {
X <- subset(data2, software == x)
lo <- loess(count ~ samplexxx, X)
count_pred <- predict(lo, newdata=X)
return(cbind(X, count_pred))
}))
Note: R version 4.1.2 (2021-11-01)
Gives:
head(data_loess[data_loess$samplexxx > 80, ], 10)
# samplexxx software gene count count_pred
# 17 85 EH AFF2 24 22.69004
# 18 90 EH AFF2 24 22.31879
# 19 95 EH AFF2 24 21.83428
# 20 100 EH AFF2 24 21.25618
# 37 85 EH AR 21 22.69004
# 38 90 EH AR 21 22.31879
# 39 95 EH AR 21 21.83428
# 40 100 EH AR 21 21.25618
# 57 85 EH ATN1 NA 22.69004
# 58 90 EH ATN1 NA 22.31879
And here a plot of "count" predictions on "samplexxx".
plot(count_pred ~ samplexxx, data_loess, col=as.numeric(software) + 1,
pch=20, xlab='Downsampling', ylab='Count (LOESS)')
legend('topleft', legend=software, pch=19, col=as.numeric(software) + 1,
horiz=TRUE, cex=.7, title='Software')
Looks interesting, but I'm not sure if it's absolutely right.
In my answer you see something different from for loops, which is probably new to you, however it's the r-ish way and its much shorter to code. The looping job here does lapply().
Anyway, hope this helps.

Adding a year to a time series with only months and days

I have a df like this
month day x
1 1 1 84
2 1 2 43
3 1 3 49
4 1 4 67
5 1 5 59
......
366 12 31 97
The year should be 2019 from Oct-Dec and 2020 from Jan-Sep
I tried to use
df$year<-as.date(df,origin='2019-01-01')
But I am not sure how to make an argument.
I want the year column to get a date column and try then
df$date<-as.date(with(paste("???",month,day,sep="-"), %Y-%m-%d,origin ="2019-01-01")
but again I don't know how to make an argument for year
Please any help would save me a lot of time because doing it manually seems impossible
We could use an ifelse statement with the make_date function from lubridate:
library(dplyr)
library(lubridate)
df %>%
mutate(year= ifelse(month %in% c(10,11,12), 2019, 2020),
date = make_date(year, month, day))
output:
month day x year date
1 1 1 84 2020 2020-01-01
2 1 2 43 2020 2020-01-02
3 11 3 49 2019 2019-11-03
4 1 4 67 2020 2020-01-04
5 1 5 59 2020 2020-01-05
366 12 31 97 2019 2019-12-31
You could use something like below. If you need a fixed variable instead of 2019/2020 you can use something like var-1 when it is oct-dec and var when it is jan - sep.
library(dplyr)
library(lubridate)
df1 %>%
mutate(date = if_else(month %in% c(10:12),
ymd(paste(2019, df1$month, df1$day, sep = "-")),
ymd(paste(2020, df1$month, df1$day, sep = "-"))))
data:
df1 <- data.frame(month = c(1:12), day = 1, x = 5)
Using base features you could use rowSums to identify 31th of October, then ISOdate.
w <- which.max(rowSums(d[1:2]) == 31 + 10)
d$year <- c(rep(2020, w), rep(2019, 365 - w))
d$date <- do.call(\(year, month, day, ...) as.Date(ISOdate(year, month, day)), d)
Result
head(d, 3)
# month day x year date
# 1 1 1 58 2020 2020-01-01
# 2 1 2 74 2020 2020-01-02
# 3 1 3 43 2020 2020-01-03
tail(d, 3)
# month day x year date
# 363 12 29 46 2019 2019-12-29
# 364 12 30 82 2019 2019-12-30
# 365 12 31 63 2019 2019-12-31
Note:
R.version.string
# [1] "R version 4.1.1 (2021-08-10)"
Data:
d <- structure(list(month = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L), day = c(1L, 2L, 3L,
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L,
18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L,
31L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L,
14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L,
27L, 28L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L,
26L, 27L, 28L, 29L, 30L, 31L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L,
22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L,
15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L,
28L, 29L, 30L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L,
25L, 26L, 27L, 28L, 29L, 30L, 31L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L,
21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L,
30L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L,
14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L,
27L, 28L, 29L, 30L, 31L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L,
23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L,
20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L),
x = c(72L, 95L, 95L, 76L, 84L, 64L, 85L, 84L, 70L, 95L, 75L,
64L, 72L, 48L, 68L, 68L, 44L, 53L, 46L, 49L, 62L, 53L, 74L,
86L, 58L, 63L, 85L, 85L, 81L, 44L, 66L, 82L, 86L, 90L, 75L,
54L, 53L, 52L, 47L, 48L, 61L, 95L, 96L, 73L, 59L, 57L, 94L,
70L, 81L, 68L, 83L, 83L, 95L, 55L, 73L, 51L, 50L, 83L, 58L,
45L, 74L, 64L, 54L, 60L, 77L, 94L, 90L, 47L, 44L, 50L, 70L,
69L, 76L, 69L, 62L, 63L, 62L, 55L, 47L, 43L, 71L, 47L, 66L,
69L, 74L, 53L, 85L, 62L, 53L, 57L, 52L, 65L, 85L, 68L, 62L,
43L, 72L, 69L, 79L, 71L, 95L, 45L, 96L, 70L, 96L, 51L, 48L,
67L, 52L, 48L, 72L, 54L, 64L, 79L, 49L, 55L, 90L, 57L, 51L,
63L, 79L, 69L, 48L, 52L, 89L, 70L, 95L, 64L, 75L, 95L, 70L,
94L, 95L, 43L, 87L, 56L, 46L, 53L, 60L, 91L, 61L, 88L, 83L,
89L, 45L, 87L, 69L, 83L, 71L, 44L, 93L, 96L, 80L, 46L, 80L,
66L, 80L, 59L, 86L, 51L, 48L, 80L, 81L, 79L, 65L, 80L, 72L,
84L, 61L, 55L, 49L, 54L, 60L, 44L, 44L, 84L, 49L, 94L, 45L,
80L, 79L, 51L, 70L, 48L, 66L, 89L, 60L, 57L, 76L, 86L, 88L,
71L, 79L, 94L, 74L, 93L, 80L, 75L, 90L, 91L, 77L, 95L, 48L,
90L, 77L, 50L, 49L, 56L, 71L, 73L, 62L, 85L, 90L, 76L, 67L,
44L, 96L, 52L, 73L, 85L, 44L, 44L, 79L, 89L, 93L, 58L, 57L,
75L, 48L, 58L, 59L, 51L, 64L, 89L, 82L, 76L, 51L, 56L, 46L,
82L, 48L, 76L, 93L, 60L, 52L, 75L, 77L, 53L, 52L, 56L, 50L,
66L, 70L, 67L, 87L, 90L, 50L, 80L, 54L, 81L, 54L, 73L, 88L,
64L, 52L, 64L, 73L, 79L, 68L, 53L, 86L, 94L, 56L, 62L, 65L,
85L, 61L, 54L, 93L, 60L, 69L, 82L, 83L, 56L, 51L, 82L, 71L,
76L, 77L, 60L, 79L, 61L, 83L, 87L, 43L, 74L, 76L, 63L, 59L,
54L, 93L, 82L, 65L, 89L, 68L, 62L, 61L, 91L, 89L, 79L, 59L,
52L, 80L, 71L, 96L, 46L, 84L, 47L, 92L, 80L, 86L, 64L, 88L,
56L, 93L, 94L, 66L, 46L, 87L, 63L, 89L, 92L, 88L, 65L, 90L,
71L, 53L, 91L, 61L, 91L, 62L, 62L, 48L, 80L, 73L, 62L, 75L,
59L, 72L, 61L, 90L, 51L, 66L, 74L, 58L, 73L, 89L, 50L, 79L,
90L, 94L, 59L, 47L, 88L, 83L)), row.names = c(NA, -365L), class = "data.frame")
Base R option -
transform(df, date = as.Date(paste(ifelse(month %in% 10:12, 2019, 2020), month, day, sep = '-')))
# month day x date
#1 1 1 84 2020-01-01
#2 1 2 43 2020-01-02
#3 11 3 49 2019-11-03
#4 1 4 67 2020-01-04
#5 1 5 59 2020-01-05

How to identify (not remove) SETS of data that are duplicated? Dplyr or other solution?

so I have data about Sites, nested in Class. In each Site there is a Time (timepoint) variable. The data of interest is Count1, Total1, Count2, Total2.
I know there are whole duplicate sets within Class, across Sites for the values of Count1, Total1, Count2, Total2 for Time.
Here's what I mean - Let's say we have Class 1, with the first Site:
Class Site Time Count1 Total1 Count2 Total2
1 a0QjvO281o1 1 8 64 4 34
1 a0QjvO281o1 2 16 64 8 34
1 a0QjvO281o1 3 16 64 8 34
1 a0QjvO281o1 4 16 64 8 34
1 a0QjvO281o1 6 8 64 4 34
And, I've noticed there are several other Sites with this EXACT pattern (or other repeated patterns).
Class Site Time Count1 Total1 Count2 Total2
1 zlG1VmpE6QQ 1 8 64 4 34
1 zlG1VmpE6QQ 2 16 64 8 34
1 zlG1VmpE6QQ 3 16 64 8 34
1 zlG1VmpE6QQ 4 16 64 8 34
1 zlG1VmpE6QQ 6 8 64 4 34
I want to identify within Class how many Sites have the same pattern. Either marking them or reducing the data sets to the first unique site pattern, but I would like to be able to say how many Sites fit each found pattern.
So, here's the partial data:
df <-
structure(list(Class = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Site = structure(c(3L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 6L, 6L, 6L, 6L, 6L, 9L, 9L,
9L, 9L, 9L, 17L, 17L, 17L, 17L, 17L, 19L, 19L, 19L, 19L, 19L,
30L, 30L, 30L, 30L, 30L, 49L, 49L, 49L, 49L, 49L, 54L, 54L, 54L,
54L, 54L, 56L, 56L, 56L, 56L, 56L, 62L, 62L, 62L, 62L, 62L, 66L,
66L, 66L, 66L, 66L, 86L, 86L, 86L, 86L, 86L, 88L, 88L, 88L, 88L,
88L, 98L, 98L, 98L, 98L, 98L, 33L, 33L, 33L, 33L, 33L, 128L,
128L, 128L, 128L, 128L, 141L, 141L, 141L, 141L, 141L, 153L, 153L,
153L, 153L, 153L, 154L, 154L, 154L, 154L, 154L, 274L, 274L, 274L,
274L, 274L, 291L, 291L, 291L, 291L, 291L, 306L, 306L, 306L, 306L,
306L, 309L, 309L, 309L, 309L, 309L, 336L, 336L, 336L, 336L, 336L,
342L, 342L, 342L, 342L, 342L, 396L, 396L, 396L, 396L, 396L, 413L,
413L, 413L, 413L, 413L, 418L, 418L, 418L, 418L, 418L, 435L, 435L,
435L, 435L, 435L, 451L), .Label = c("~", "A0e3A15Lh1d", "a0QjvO281o1",
"A0R2gEqRbTv", "A4J3Jp6KNz2", "A757EHpLOya", "A8kkDgEvEZV", "ab5F7MfRxZW",
"AcjfpLUXjwt", "admxsO3fTtq", "aEBm7REs6XS", "AEZgWxwdbd9", "AezXCsZxd2U",
"AFjm1YmnfyO", "AFTwI0xBM6e", "aGw7PyLMEkl", "aHNXoYj7uNJ", "AibLRYCSE4P",
"aitNX6Qxkon", "ajEqsuhE9fV", "aJFDh98Iahb", "AKG4BvCUVsF", "AMtGkXGugJb",
"aNczAtKAJsv", "aoY0wrz6qBF", "aOz3ikxG7qM", "aPWuF0rDfuJ", "aQrGXlhzEJB",
"ARu0wnYDkam", "As7tGowP84e", "AsqolR3dfgv", "atj39UeK8N9", "atmjKVCRnzw",
"aUhP7zZ7LPU", "aUMEQzUKI0K", "AuP8NAgS7Th", "aUyy9i4fwhS", "AVFW2vlGxds",
"awoAlwC06Go", "awxCmxmeea2", "AWYFb5fwcYb", "Ax2Q16uPW55", "AXO6R085bth",
"Ay6W05BTgDV", "aZMeFIlkevS", "B08adcYOEl7", "b5MVFPi1inY", "B7fffQm5omx",
"ba3kFfcKXNk", "bCK7hWM4bnK", "BDlYKSCaOIG", "BE3TZDysXuQ", "bErpy9bSZAV",
"Beu6pmpSDJE", "BgfNJiJlDrF", "bGUeQEEpq7q", "bgWDDBsRLIL", "bHwo17fsILI",
"bifefa8JnfN", "bIQ3gsw51RH", "bisxDvmwluW", "biy6fHoOcZp", "bK7yQP8LNkJ",
"Bke0tWeJyBr", "bKMNhuIYaYW", "blkWvfFDVm6", "bnaDFC8EVAo", "BNDeQ6sJctI",
"Bokks2ESodd", "BoKlS77F7Il", "BqLRDDu69ic", "bqoZAzbsajz", "BRlA0HkkMGM",
"bT501IhkxV9", "BTliRZoJs4i", "bTTf1R7zgRn", "bTZAPQPXgI5", "BUtglXWCjkf",
"BvcJEyVWsGG", "bVHpRZguCL2", "BVymUZcbCuf", "BwkVolONMBn", "bWtq9NnOoCU",
"c2YR2oDyx7t", "c3dhvyZuPum", "c3LYcysugey", "c46Q9ExLocA", "C52gwcl9fmp",
"c5IYnQ3M7dj", "c6yCKEAemfr", "C8uv1qapHmC", "Ca2rjTu7g6A", "cAsHVMiIVHT",
"cB7mNM1MNm0", "Cbboq0XBHn1", "cbUfMWJl9sK", "ccixNtjWLkf", "ccL7Esacksn",
"CgmvbI2pkyK", "cGvhZR5kDxQ", "chFA8wLA953", "cIb00kbYPgm", "cjoj6MxgfxE",
"cJrxpXipqCm", "cMR1ECoHpE4", "CmRKRa25mZu", "cnCuI3VeJKt", "cNUlz8NllVu",
"CoySgwRgeRE", "CpZyeEzz39h", "CqIH5ytvqTS", "cRbK3weaIO6", "cs2MtDT1y17",
"CSVVXoe0xGC", "ctEZrxoEucg", "CxCDdfOd0Nj", "cXzO64qne5O", "CZq12nSSyn9",
"CzTmTRr0krx", "d3F3FBUFtWi", "d3f8P40FxnS", "d3thFMLEOGr", "d3UA2wZLHlM",
"D3wXzwwrBE7", "D4Bb0bZE5eK", "D5BprGY8EIU", "D5F054OKtW4", "D9nOWZAX3yT",
"DAcTRfO0CNG", "DbjU3iBZtGx", "Dd4sp3zIfSJ", "DDC8Dws74Zz", "DEFzmar1QtJ",
"dEoQWkLavTj", "deVhoPko4Bh", "DFBDO1gXQwf", "DfdvXXyNSoV", "dGCqYO3Zi6p",
"DGDkUV76OgX", "Dgt3VcFh8rl", "DHdEugYqcEI", "Dhku9zrZoJe", "dHokR5oLiIl",
"DhPZWGceA1Q", "DiKXevYOYNB", "DJIgnE1QQbB", "dkR7YOB6UT6", "dKy3aHycCap",
"dl9g8UYxk20", "DLmEBtWqO9S", "DLza3NSQYUI", "dmUHnTHgfYg", "dnRXJOdEzdw",
"doRK8OhG0kd", "DQaEryfraV6", "dQk8ubXxXLX", "dQOwWKXxFeq", "DrHlSXIHalR",
"DrLeENdZwxX", "DRUaAOrybxb", "dSJcUkmJWvZ", "dSuHNzaRaSf", "dtDftsTowRA",
"DVF2BNdSzV9", "DW7NajJs9ry", "dw94DZyrpUZ", "Dxa8RiDlXB6", "dXBB3LIqhd8",
"dY1ATXbywBu", "DY3V0E6pUYD", "dYIdx3HoWbL", "DZMyvdZEDeB", "dZrjKdqCi1w",
"e2cMNKCnHOw", "E2g3H9rUdML", "e59NHDOFTWC", "E6KoR8hXk7P", "E6vLBntf9QE",
"E8PnLO9QRcE", "e9NQxtBNruk", "e9QjFd6fZ4I", "EAdX1JPb4Dm", "eCGBeD0uz0D",
"ECHaJeidpTR", "edLdPyMjbaz", "EefeXxr8yDS", "ef6tzAcpMeF", "eFB6BfJ2BTY",
"EjFYleP5G9K", "eLGdmsoRjWn", "ElmgbenqYn7", "EM5PauW0KWg", "EmhBF1JUw3i",
"enR40fiMtoo", "EpxhEmcMVXh", "EQpPsVwWvqz", "EQtHhnAYjJp", "erfgs35WGXU",
"eRNEYF9OfA8", "ERqjIjzKnNm", "EsdcJsyJTJG", "ESNgljw6VvC", "eSZjKIwHPYi",
"etyPfIkrlrM", "Eu1JrO8bBkB", "euFWewBZ5Xr", "EVaNkH5nz1s", "eXgA6Zfn6KQ",
"EXIi96SW1Bm", "eYPdhvwFirr", "eZ2NazTVbb6", "EzN8D82lOTp", "F03oK0VRgyk",
"f0WCSs2fwvv", "F3CHKWYM2Pb", "f3FoF8cpKiH", "F42k81lXXMO", "F8ZvmoAy2bh",
"fd5zuIbL3Qd", "fDN9KAuRv2o", "FdqK3U8rDRX", "fG2ws21A6Lj", "fgDQSAYp5pj",
"FGjbxwib4q5", "FgLXwaIGGbn", "FiqXUXkRHXr", "fiuesJ8f3xw", "fJAqAOFzB2b",
"fJmQ6P38mHh", "fJy2O3xh1fV", "FjZuMxKuYvb", "FKe5fQHbu8l", "FKuw35vjqRz",
"FmAQ159jI3w", "FMGmKkEOmV4", "FmuzZuFFMzD", "FMX7RNQIwYu", "FNUYBvpbWaA",
"fnzDrz05g0T", "FO80di9Jxuk", "FOKfyVchS21", "fP0XmUTTfks", "fpCA3TMnMA3",
"FPkj0JvlmyK", "FPSoejJAWSU", "FqkwtkM7eXB", "FqlQZiGKxpr", "FThJa71HEEs",
"FVaQ3fSHtT5", "FvQrsd2gVeu", "fx7bCRgdYic", "FxrH3E1ge0f", "fYtsyMj84LY",
"G0EID1cpxEB", "g4jJZ1SNP4I", "g7AYmMzlRL5", "G7hnxrBDXd2", "GCQVHCnV25O",
"GcYMteoIkw9", "GDCM1IWa7Zh", "gdsUTJnwdzb", "Ge7oZ5R4iBk", "gEff10Pq35y",
"GFPi9bpW3sN", "ggMEnqgD9kD", "gKR0a28tTp5", "gKRGyOXbpzj", "glEuzcZNWIM",
"GlWdTuycHxs", "glyDmwEFzrr", "gmmjFqs7MFB", "GMWgNQ8JB1r", "gqFwQOY1wSE",
"gqNh2d7WJva", "grKa7EwswRX", "gsIY3JD3iHh", "GSWPAgMxhy2", "gsX0auFXP9m",
"Gtef53Qyxrj", "GTQqEhUUV1F", "guGv3PY445Z", "gUve5bZAut8", "gVZ58EQOH6K",
"GwXv8OX78AT", "GXIQmznIdQe", "GxIVLRDNmVF", "GziA2Vc0HX4", "h0RMK448nhs",
"H0vjaO76Wg8", "H1G7wWYemSm", "H3mOm6sbODE", "h4IQGhyYAQp", "H6LR8zRVQLW",
"hAoSAyLR3I6", "HB6ZBS6kyJ8", "HcKIEHFgpDb", "hCuRPOStRLU", "HdTW2XJg3IO",
"HdxFUpXFp2O", "hFwwNnFm1B8", "hHMHykeQBua", "HI3Z2eSmWYl", "hiRGzSqrLx5",
"hjeei4JLTiF", "HjwC2LDSWHK", "HlElMRh1t6W", "hlIZJlEsd7B", "hLwLFwQgUdb",
"HmIC1eI4aEQ", "HmuBn2Tdutx", "HN6AdgqShbf", "hoSu28MRYPv", "hq6x4qBOYsg",
"HQHoA9YKMAI", "hqvimuJJhKL", "hrpWiEmnynY", "hsLoXTDJDib", "htJFOM9EYmH",
"HU4RdTNlezp", "hWWRAoV26mI", "HXA0U1WlIhx", "hxckGietsww", "Hy4Uo9AjrnA",
"hy52ywnDIAM", "Hy5stTfQzCG", "HZd0k5dqZ9h", "hZV0CekLNni", "i0rzEGmhViY",
"i0UbyVCIMMY", "I21MUYJoVMy", "I2G30Bxw2BX", "I2tQnsS7wn6", "I3n104WlitM",
"i3UCGccuhCZ", "i4KTQ0RGK3T", "i5GWQwiObW1", "I5NWo4ucWB3", "I6v4GYaXpQC",
"i7xMMyJ6A6E", "IAvpgvgrG0f", "iBB477oQopG", "IBhZ0h4Ap4D", "IbltT4i4TK6",
"Icts0NC4qAd", "IesVnrPQeSZ", "IFINQSPg4YM", "IFTZCzzniHQ", "IFvY9G1PHAV",
"igDf6uUnTYe", "iHIs3hIFf0i", "IHWMvXnrYmQ", "Ii0xFlLHHXz", "iI2i5pPbl5B",
"Iiwy3Zv7iLb", "iJax0w1KHEN", "ijl4gbKzr3X", "IJwB2CRmy7D", "IKMMHGYtcDC",
"ikpa1wjF92j", "iL8UKqtpf9G", "ILiQ2JLmcLT", "ILJAF0UeEJj", "in5GYhicsOP",
"INcVgc44sm9", "ioVTytF5utn", "iPY8yPbKyA0", "IQIfv1gEqzC", "iqKq6QyUII5",
"iqopOI7y0N3", "ITafa9GjY9I", "ITzEvGOU2GR", "IuymlqNZCLI", "ivq1Bh0PvUd",
"iwrIeTg1XFz", "iWvqk82htTQ", "IxcUubx1fw5", "j2k93SJevE1", "j2X8kPMcchC",
"j6UnkDFKZc1", "j7218NqxjYe", "j8DdqpZn2qc", "j8FYrPT09Sd", "J9JOpPQB23Y",
"jaDbDaXw0Pc", "JcZ2R7KZzTq", "jdswhtT866l", "JE6sdkvuc9S", "JeSc2hThLHY",
"JEWdR4I9TIm", "jf0RxRXJQD0", "jFFOiUs7WoZ", "jhngb8KdYU1", "jiIV8o3C0qx",
"jJ1tYGFTuaR", "JJD60zjyHFp", "jKg6rpNATKH", "jlaaYySSxTv", "JlEPa3N6EgO",
"jlZ6LAYKEo9", "JMhFN7V0B1r", "JMr6AvPnW1M", "JnJtmnGCY95", "JnsP1SLvvsw",
"jOl9gZtASeV", "Jq4XG5c63t1", "JqfwjhLrHs7", "JrxejHLYDML", "JTNDUJAu3DA",
"jUtaZ7I8azt", "juWqrHQgdew", "jVb0CSg6sIR", "JVHpkK4exDw", "JVk9m9vVA1D",
"jWFefvuCwnA", "jXoQbHS18G7", "JYfu3Ld3AuN", "K2Lh8hkI6ST", "K3RIalye4fw",
"K3rIsFyLwv7", "k6fqIh47UYc", "K7re2lFVRfv", "K9HNTtT80IM", "kAQQIuh4eZr",
"KbEhvcWmvAf", "KBMxpwB6DCO", "KBybjbIp9VK", "kCdAI1b02G6", "KCPICjUZcE4",
"kCQMO6wkkV5", "KCtzRrOqmal", "kdDCRlEWqYr", "kdUL3XxL1bF", "kdXwwhZfS7V",
"kEeOSZheoND", "kEhPOqEXXk0", "kGE4jAoYn5L", "KHXn2gzpI0j", "KjMGcLd3XXd",
"kK6NYM3jZkd", "kKsL2QkNR4K", "kl6QWeL9RDW", "klThMLasoQV", "KmfuUMQ7T93",
"Kn9F1mXO0GV", "KNU8WQL2zSc", "KP6O1BkuoPX", "KPF6QKOADPR", "KpV6xl78isl",
"KqyKD3POUbS", "KQYxmgQNUSD", "KRQ61nuKa1b", "KtDVkM6bDeW", "ktTYjYLEW3v",
"kubDpNzUTG7", "KujnNfVcY2N", "kVJ0jf7P7Wf", "kWBZ1e0JH5h", "Kwts2m2rUUp",
"KxEa3dXzAYv", "kyGz0JzX3Z0", "kzHnYcum1wX", "L3iJ4hZ2ypn", "l7dBO27dhA6",
"l7RKRoGgmlq", "L7xlpOoRnWm", "LaH8j5yWJZ1", "lawU1EpVZVc", "LBEkbl9SzHf",
"lbvPWYrpTPw", "LcWVIO0Jsqj", "LDmpwdWKomn", "leQOMrPQiqf", "LFOfMnjCDvJ",
"lgEnN00o6mZ", "lGgWFnakeII", "LHie5mY8Uj8", "lIEtVHeJ086", "LiLYwGv2WWN",
"lJ41xkkb1jI", "LJFDVm4S9HF", "LJzqA45qmSZ", "llQAyMkWXID", "LmBKIXa2mSL",
"LmwBbNZehh2", "lnkTmWmupfH", "lPAr5SfstTF", "lpCdKHJgyDr", "lQfxQMSOVqP",
"lS1XvFsr6no", "lUDMkJxSxHL", "Lw70k8Wjzp4", "LXWKW1xwmoZ", "lYNYlzUvgos",
"LYZ27cymGw5", "LZ1OWhYhPiZ", "m4ue4ZOdIep", "m6E2SxuEKtc", "m7fmNp4WilZ",
"m8FGZ1tP0UE", "M8kI8XD6qF9", "Ma2YKDqULAr", "MA3CYGbUEaG", "MAk4KZRu1L9",
"MAtmMxsNpeZ", "mC01s0xdGEm", "MCE5Y33BYDN", "MCT0SGxhkuU", "MdmyzozNJ02",
"mDNJnXJ3Bap", "ME541MEplIz", "ME9FWjRMe4e", "mePQU0trYhJ", "MFT0CnzHbgk",
"MFy31o7euAb", "mfZwiJJpZcR", "MgptQftlksp", "mgUgOViogq7", "MI2vOsP8NSo",
"MjCkEceL336", "mJY0L6TiTId", "MkU5WMbgI4U", "mKYg307awDr", "MM5BhvP1qVK",
"MM5CMbf9hxl", "mnshKO7lVDt", "moicbsA41fH", "mOSub2ULY1O", "Mpi4Xzop4kw",
"mPQwmRVhsKK", "mpxTG4BSHvb", "mR9nchmQZXC", "mruhLKuBF86", "mVZB3R5M66F",
"MW1EtjyMl5d", "MXHQSQfyHl2", "My1mHzVMqV6", "Mynld4Vekod", "N1giIHXfzhb",
"N23VxXj21Wv", "N2gVM6xHjXX", "n33C6ztvpqu", "N3LQS3eat8p", "n46vbqoLchh",
"N4rlgJRGUs3", "n5H2FaL7kap", "N5PPLwwES0c", "N6CPQoLRnz6", "N8nfcWXZtit",
"NawPD8q2KC8", "nChFLgqqH0w", "NCqjtm01Y4E", "NdMiR2VVel6", "nfR5nCiNHMC",
"nfwoSSAiWjg", "nfWs6WgmRC2", "nG7qJqJR13Z", "NGHkoHvBwF0", "nH6JZBFhCXs",
"nhfdWznpsqJ", "nhnQpVPQ7zK", "nhsj9HCnhEs", "nHtTsUMZoVG", "nIhIdZmXLXS",
"NIsmtALRuS5", "nj2KML2oqvV", "NJKcpotvrAQ", "nkXtOreJnSJ", "NLBLC0uWFuB",
"nmdSUueCjti", "NP8pgYnty0q", "NQxDKw6jGTj", "NSZxDwLVCeC", "nUanptGavqT",
"Nv5WX50ktwr", "nvJQYEQIFFM", "nvXHNeXXvJ5", "nwbO0NqAg7S", "nWJFiQq1vDL",
"nx2J294i6hk", "nxgu0uT1tLT", "NxKCqlm0eTG", "NYEpdnELJ54", "nYIBsKHueFr",
"nYnOM20f4fb", "NZxaguajfAY", "O1U2KTQp7RW", "O2p0zdfIFmP", "o3nzTkLC1Pl",
"o3pKyi7ckFO", "o4gtcJidna5", "O4slz8eLLn6", "o79rSRM0UlM", "O7qGvpaAt2w",
"oByIGUGsrgx", "od9Sosf2Y0V", "oDTFc2FqImi", "OdyuvCVU9Hz", "oEFK7vjkTU0",
"oEXOZcbaHxA", "OgLIyzin181", "OHtxRBRAzYs", "oJNbeCd6bvb", "oJsgj7WMDkq",
"OLEt9ovMHrz", "OlkZe7ivV0p", "oN0anW8xCpq", "oNDzB1D5as4", "oNfV9ntBJ9u",
"oNttkuJFbwC", "ooElCfPc54o", "OpEVn6IiULE", "OQ3BQRswMx7", "oTB157EY3jY",
"otmVyzT3xRC", "oUWkMygGP2W", "owxf1XoQ3Lu", "oYgWYWUVt2h", "OYjhvD7DqIP",
"oZfnfo46pS4", "p1NV2hE2fCZ", "p25NocgpHkc", "P2eQdjxbuZo", "p3T3oB4tfNN",
"P3Uob5UKAoM", "p4hBFnI8WIp", "p5L7w9Tjay3", "p7C2DczQikw", "P8tFheT6TtS",
"P91Rf8wCj7Z", "p9J64kFu5Fd", "PDOfJJdpbob", "pdRTIO2JqPL", "PDWC7RxX4t9",
"pEAFBcOJIVF", "PEfq6d3TONP", "PeNS8yHqYH1", "pEvaEn24SR1", "pg9F69FU9fh",
"Pi6v7zcA26e", "PibIwh4xKHI", "PicYz4ZaEkF", "PIm96jtkVB5", "pIVjHCsQgJI",
"PJI3sARzQAG", "PK027w8aZ5K", "PKfz9RYfKzF", "pl8h1HdqpFW", "pl9IGnhmOJc",
"PlISiBPN3db", "pMiRPEvyleJ", "pMtEAU5iVTB", "PnB0GLiMdBm", "PPb3XMcCAf3",
"PSdLvfFlDRF", "pthlRKVLgNp", "PTZfXfOkUR1", "pWmPB9No5RJ", "PWXwPbUM2DB",
"pxPQCkuJZrl", "PxXh1I86blw", "Pz198xRjRHD", "q2UUKkPtvll", "q4hyZcb2pgA",
"q6ke2WlwbWr", "Q75pcfnDLwr", "Q86baYhZPOB", "q8fmqtJVDhh", "qBrBhSbFC0d",
"qc9eMgI8Y95", "QCY2lUMpt7f", "QDkCAOGVng6", "QdYKp8ivavV", "qeBFicifeNz",
"QeKGz2D6wNe", "qEt7nmwua6v", "QGJz6Rv3qHU", "Qgzh7S5pLc3", "qHaaYvuNGIB",
"qiBueINJbti", "qimfq5GL5mV", "qJsVouyMqE8", "qlnxDl1BOrw", "Qlt1DOyb7iP",
"qm0fcx7VGOQ", "QMT77ObrHQa", "QOyCdSRSUXL", "Qpj3LVa0kMf", "qQ84fCTxdGh",
"QRaKmOedEZx", "qs9EipoiiBD", "qsPQEZph59z", "QTFJClMfP8c", "QtJyTjN5faU",
"qU7z54bY9jA", "QvByLV2hsHo", "QVFUUfes7vc", "QvQ5bpVOJDj", "Qwzbgh4Flmx",
"qx2DdF2CKFL", "qXdueHJNqcv", "QxSfgx5QfT7", "qxuRrLWQmXL", "Qztk8cjmz1e",
"r0ehsy1jjxa", "r2w7bZu3FsL", "R3ac44RpwRG", "r4mXVpHUWC7", "r6p12UeHOyg",
"r9efDheFtk3", "rakWSnvNhWr", "rbBZoYFr4DM", "rBtlT7YCRKx", "RdbYAXOnm2S",
"RdM4hjZsFRg", "Re2M8SlCc98", "RfmkqgjDUPL", "rgAmPaAHmNU", "rGbQXTyOdmW",
"RHpQbDCZK5O", "rhxxSbYXZRR", "RiIZqF2hfqY", "rIR8cwAz0sf", "rJ3tipUjVQ4",
"rlAmYWNUTnR", "rLiYzJJRiBA", "rLOyzoOdZqC", "RMKAo2HcVkM", "rnGH1Q5IyIU",
"robJRJuEFfM", "RovRnV9RWFd", "rpmWXDmHjsq", "rPPdTvv1QoY", "RqLdtXwHdGO",
"rR1aDWav3z1", "RrjHJQJDQSr", "rrZEwHEjjy8", "rsM3sdDc3Lk", "RsmDQZSmpD7",
"RtK3aS9WP2H", "ru8BHTnYxI3", "RU8DlKBg48x", "rUysfjKrKqk", "Rv3o89GkqWH",
"rVC8KePJHu3", "RvCLp5qbvtz", "RvQqAbOcEfA", "RW617O0UjQJ", "RWvmueaioAl",
"RxADuUq1Ba1", "RxHTSbz8VN5", "RxND5KsxzvW", "RyRJf2UHJL1", "S1Rh4YnCAAZ",
"s28njgt1wYe", "s4eb8Spa5TC", "S6gaiIWGmh9", "S6X4d5WHA1H", "sAnH4cWV41G",
"SATZgjyfpdZ", "sAyk7hwXEbV", "sBu9GwU5IKe", "SdlDgZMNxqX", "SegMIAP4dhw",
"SfB5NwJXaot", "SfPGp94cYZa", "SG0QMcMgRRq", "sG2EfH7UYLQ", "SgK14sd0Fq1",
"sgNOxONNZIv", "sGXcrRdwzAk", "shkTq2LdpXw", "si6qmHhCV9F", "skR6XpFhu3u",
"skuXY545bae", "skVM9VC2v6H", "sLkylFDaonQ", "sLQ3GDMCRSz", "sMVuTESYbpd",
"soMCF3RbHqt", "sQhxc449PV4", "sRbFOoSk7qZ", "srTptJGYtcK", "SSS1hmwqHOR",
"StSjQheznIv", "SvLVieXqQT7", "SVR6pSBhbCb", "sWlH85siDIT", "SWTCBn32M8D",
"sYBdL54a73r", "t0V5NCdjdPi", "T15MpYA7f51", "t3snPDHuVBW", "T5LdflE3Peq",
"T6RUeMH9KP0", "T6VbSgxjG4o", "t9Fl7c8SJbm", "TafeAKXESCA", "TBMPJiR0PKA",
"tcjz9dmJW4y", "tDDh1EjIZkh", "TE62MxBLgne", "TE7dhvcKVwp", "tEiDKptkacd",
"tEr481bYdow", "tfEtbnUgkGv", "TgpNd1eUCH5", "tGV21Z1HgXN", "thQZhxRh887",
"TJp862VOKlS", "TK7P7QXIDOA", "TKb3FP8mXY0", "TL5cvVAN3cA", "TM31sX4CThP",
"tMpwPcDzIfU", "tNf8m963xKK", "tnR9XvFJ5d7", "tNu7AdZ5358", "tOEYJ1EgIkn",
"TqSXqCuyodR", "tRgUTgCKu4J", "troIuBzxemz", "TSQWaAvOer5", "TsSlV9eE7Mz",
"ttKnsfno2BN", "tvoTu4cpYbh", "TWJPFfCeHES", "twyDPmlDNjH", "TyjDUvHkCAx",
"TYYPCGssY7i", "tz00ETYw78Y", "TZ307ap3HvE", "TzPwGs1AcCL", "TZxEGcWjbdk",
"u0ezFwC4OLL", "U3DjjRVyEun", "u45lZujojLF", "U6Mo4GsQKwT", "U7jt55boMwC",
"U8feQBluEhj", "UBe2SLdSmxV", "uBjjsyieqtr", "UccWk7OAtZ2", "uDXFpf8Ko6P",
"uE4KejhmDyk", "uGfkThgxZsI", "Uih0KGtvZeo", "UIyI4hkq7Bx", "UjoXPWJKPXb",
"uKFFT93nPmp", "UKSoohp2vBC", "UL70316n0C2", "UlD5QNXAW40", "uLDFnAy4ro0",
"UNxoCz1KXnW", "uOmh6keHjf6", "uormVxMEerw", "Upe0kYdbeUy", "UPSbASHNQmU",
"UQ1K5VqXqcZ", "uQvg5rWo87I", "usFB6MgBB6t", "uTeZmtXQzSN", "utgv86YyClH",
"UTmdWR44H5x", "uUmAJIXkmsO", "UUsAfkqIPhV", "Uv6Baj6YaG1", "UV9ZR51T6Ts",
"UvVxiC7b1jZ", "UW4ZNlm05Jq", "UxEq7311Xzd", "UXhcOzwv9o5", "UXSSmcXoWR8",
"v15yxuZyGjR", "V1MbBFGqwbB", "V5LD5oYeZys", "v6BprVsEEt2", "V7Hl62C5Wgz",
"Vah8YYh5HI5", "vbDOTEMQjfW", "vBjsjEqsmWL", "vBym1l507tA", "vf1kkxsjkB1",
"vFSbE4W5Kg6", "VfZPt9kXxL9", "vGLQ19KWuBv", "VHK1T5sygmw", "VLuN2iZ9oZp",
"VmwVU8HFDBn", "vnaUuR9C4FH", "vniKeY4S1Ru", "vOu023c0Snx", "vOuGO9bkEUa",
"vrQvRBzXiLv", "vRRoviRJVgX", "VS1mxlo1mVx", "VsFXXXagVmp", "VSHfWQyhzUu",
"vugElbQMtcL", "vvaX4oKLyKo", "vw87QIZ7dhk", "VWLVmvtDCSI", "vxXQe9jxSPE",
"vy0hyVTrTom", "Vy1JFQbsNBB", "vzGc2nPWraO", "vzVRv2jtJxL", "w0aCC4wNNzW",
"W1wtZLbWuY0", "w2yXiR4CyWt", "w539HzekPQh", "w55gRgLikEN", "WBhss2tvLa8",
"WcPEy9epMgd", "WCSGolF5yhy", "wdcS5ORWZte", "WDyq0ryAjpn", "webeuXrveDi",
"WeSJR8GDPmC", "WFApCUf18Lp", "wfFCmvMEGOQ", "WFiPvuGJf9O", "WggRnJplCQI",
"wgqFTVU7Iky", "wIMmZwl1gpX", "WjCGPzMzLVr", "WJfiDULf7ZC", "wkl1yyAzga3",
"wlspYUyDoQM", "wm060hpEM7g", "wMPB6u0GZDL", "Wn07Tbv74qp", "wNha3idA7l6",
"WnZVpXq5XCO", "wOe4JHkqbUm", "Wog7gclb7TJ", "wq4bmXnJK45", "Wq4O1nlYk1C",
"wqUwUpMD2mJ", "wrGYa8E94Yc", "WSAfRmiEJOF", "wSP90pEfCng", "wSW662GVwZP",
"wtoXU3G9YIy", "WtPSqPwjH2f", "wtV2TtEPCCZ", "Wtw2jbyaHz2", "WUChzooYWJ1",
"WUFgPdTN02g", "wUQiuRjZxiO", "Ww9Rq2KLlqV", "WWabB2sc4B7", "wxKEHpSLvib",
"wXnoTA2MDy9", "WYk4A1fVYD7", "WYMXHupBG7P", "wzD83xmvR3b", "WzemydwRD0R",
"X4ZVDdDd2xa", "X6efCWparbb", "X6uv3PName4", "X7deWPhTiIy", "X8TsrtMQFiu",
"X8UmaBiq1yy", "xbJCVaOZWp5", "XbjRzgMPN24", "XbubJh2yjOw", "XcqBCAaLcq5",
"xd8LIlN7N8h", "XdKVljaiZ9j", "xeEUMp35d5m", "XeUDpg1CTKf", "xf9Q4yYDlq5",
"XFEHZnnEGkT", "xFO9GKAXi1n", "xfxtwRZ7Ejp", "xhOpIbHQy8I", "XjBkSXvZLOZ",
"xjfIPJ04cET", "XLt8l1uPicg", "xlYle4v5GZ8", "xmJNiAbmSfe", "XnkRi1jTMKr",
"XPhxWI0fDyq", "XqDQsrhQ7W5", "Xsd3yzbnFOf", "XTF6vymtG8J", "xuovzIjWZUG",
"Xv1I8z1cK76", "XvVmyn071HT", "XxBMueAFsnk", "xxVZKlzMYJJ", "xyr4dO4G3tW",
"y4rr2PbfufS", "yaa2uBLsdRa", "YBG39jGSV17", "yDcnCB4aZEX", "YDuoFIKpONe",
"YdWxRCaQR2D", "yfgSogitBGX", "YFi06xiFHWs", "YFi2V7qfmJf", "yfpM2zJ3Zuc",
"ygTl7hih5qi", "YGtrgJxKWiU", "yIcfnuZhejK", "YIxt0WtezdT", "yJ014QFEqru",
"yJO8QTnBF3o", "yKfdWuLsdDx", "ylMgcLnwgce", "YNy9ymD2A8p", "yONz8gph9A7",
"YowwYq8CIXJ", "YPsxC0bl7T2", "YQP6diqjJAl", "YqR6LoSk2Ed", "yqwh11CvYXU",
"YRemZ3p9bFA", "ySxRSgTOeqD", "yTvx2IJ0w0z", "ytwga9hKjVj", "YtyO06HBaVr",
"YvEkkZlNeCK", "yVFdJkYsLK5", "yvoQHXHGvbT", "YVT9zsaVBzp", "YWbmL6VK8R6",
"Ywm8eA9tZHe", "yXady1QV27H", "yY7MHufA6C9", "yYG52aLO1GK", "yYgG4h097xR",
"YyhPAO5yx22", "Yz5yhyHf7Ul", "z2cGjpx37Mw", "Z42m6cWsI9m", "z4DptoHrJnb",
"z4kLOdnL1Op", "z5tZes2s49Z", "z5WklS85YjT", "z6bId6qlNk4", "Z6ZZLw50mAM",
"z8MwD6T43n2", "z8UkGdr2xNs", "Z90jET09ZrD", "zaeb1Zos2Mu", "ZBkpY2KdibX",
"Zc0BcScQDBU", "zCjn57zZQVN", "ZcrdEBruDka", "ZCT4YbaBFUb", "ZdVIx83rdI7",
"zEQXA689E4a", "ZfjQmCjVKRF", "zfutn6ulVcO", "zFzYdXMnPoP", "zG4JqtM8wHO",
"ZGyAErBl5PS", "ZifoCg4OvIj", "ZJ6MAab9PJE", "ZKVzRmYkKzQ", "zlG1VmpE6QQ",
"zN6xXPgmzqK", "zOfDRrZmbQO", "zOGa9wLHDFE", "zQmuipEUYbz", "zR7UekDUG3X",
"zrs6iFpEtF1", "ZrUjQFzR1gM", "zTnxsAMqHRP", "Zu7gpmcwfqY", "zvOkAI9ewwE",
"zvv07VAowTS", "ZWAdop7zYgJ", "ZWAEE8DrywN", "zxIlF5RwQFi", "ZXONCt7P01p"
), class = "factor"), Time = c(1L, 2L, 3L, 4L, 6L, 1L, 2L, 3L,
4L, 6L, 1L, 2L, 3L, 4L, 6L, 1L, 2L, 3L, 4L, 6L, 1L, 2L, 3L, 4L,
6L, 1L, 2L, 3L, 4L, 6L, 1L, 2L, 3L, 4L, 6L, 1L, 2L, 3L, 4L, 6L,
1L, 2L, 3L, 4L, 6L, 1L, 2L, 3L, 4L, 6L, 1L, 2L, 3L, 4L, 6L, 1L,
2L, 3L, 4L, 6L, 1L, 2L, 3L, 4L, 6L, 1L, 2L, 3L, 4L, 6L, 1L, 2L,
3L, 4L, 6L, 1L, 2L, 3L, 4L, 6L, 1L, 2L, 3L, 4L, 6L, 1L, 2L, 3L,
4L, 6L, 1L, 2L, 3L, 4L, 6L, 1L, 2L, 3L, 4L, 6L, 1L, 2L, 3L, 4L,
6L, 1L, 2L, 3L, 4L, 6L, 1L, 2L, 3L, 4L, 6L, 1L, 2L, 3L, 4L, 6L,
1L, 2L, 3L, 4L, 6L, 1L, 2L, 3L, 4L, 6L, 1L, 2L, 3L, 4L, 6L, 1L,
2L, 3L, 4L, 6L, 1L, 2L, 3L, 4L, 6L, 1L, 2L, 3L, 4L, 6L, 1L),
Count1 = c(8L, 16L, 16L, 16L, 8L, 12L, 24L, 24L, 24L, 12L,
8L, 16L, 16L, 16L, 8L, 8L, 16L, 16L, 16L, 8L, 8L, 16L, 16L,
16L, 8L, 8L, 16L, 16L, 16L, 8L, 8L, 16L, 16L, 16L, 8L, 12L,
24L, 24L, 24L, 12L, 8L, 16L, 16L, 16L, 8L, 8L, 16L, 16L,
16L, 8L, 12L, 24L, 24L, 24L, 12L, 8L, 16L, 16L, 16L, 8L,
12L, 24L, 24L, 24L, 12L, 8L, 16L, 16L, 16L, 8L, 8L, 16L,
16L, 16L, 8L, 8L, 16L, 16L, 16L, 8L, 8L, 16L, 16L, 16L, 8L,
8L, 16L, 16L, 16L, 8L, 8L, 16L, 16L, 16L, 8L, 8L, 16L, 16L,
16L, 8L, 8L, 16L, 16L, 16L, 8L, 8L, 16L, 16L, 16L, 8L, 8L,
16L, 16L, 16L, 8L, 8L, 16L, 16L, 16L, 8L, 8L, 16L, 16L, 16L,
8L, 8L, 16L, 16L, 16L, 8L, 8L, 16L, 16L, 16L, 8L, 8L, 16L,
16L, 16L, 8L, 8L, 16L, 16L, 16L, 8L, 8L, 16L, 16L, 16L, 8L,
8L), Total1 = c(64L, 64L, 64L, 64L, 64L, 96L, 96L, 96L, 96L,
96L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L,
64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L,
64L, 64L, 96L, 96L, 96L, 96L, 96L, 64L, 64L, 64L, 64L, 64L,
64L, 64L, 64L, 64L, 64L, 96L, 96L, 96L, 96L, 96L, 64L, 64L,
64L, 64L, 64L, 96L, 96L, 96L, 96L, 96L, 64L, 64L, 64L, 64L,
64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L,
64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L,
64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L,
64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L,
64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L,
64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L,
64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L, 64L), Count2 = c(4L,
8L, 8L, 8L, 4L, 4L, 8L, 8L, 8L, 4L, 3L, 7L, 8L, 8L, 4L, 4L,
8L, 8L, 8L, 4L, 3L, 8L, 8L, 8L, 4L, 3L, 7L, 8L, 8L, 4L, 2L,
4L, 4L, 4L, 2L, 3L, 5L, 8L, 8L, 4L, 4L, 8L, 8L, 8L, 4L, 4L,
8L, 8L, 8L, 4L, 4L, 8L, 8L, 8L, 4L, 3L, 6L, 8L, 8L, 4L, 4L,
8L, 8L, 8L, 4L, 3L, 4L, 6L, 6L, 2L, 2L, 4L, 4L, 4L, 2L, 4L,
8L, 8L, 8L, 4L, 4L, 8L, 8L, 8L, 4L, 4L, 8L, 8L, 8L, 4L, 4L,
8L, 8L, 8L, 4L, 4L, 8L, 8L, 8L, 4L, 4L, 8L, 8L, 8L, 4L, 4L,
8L, 8L, 8L, 4L, 4L, 8L, 8L, 8L, 4L, 4L, 8L, 8L, 8L, 4L, 4L,
8L, 8L, 8L, 4L, 3L, 8L, 8L, 8L, 4L, 4L, 8L, 8L, 8L, 4L, 3L,
8L, 8L, 8L, 4L, 3L, 5L, 7L, 8L, 3L, 4L, 8L, 8L, 8L, 4L, 4L
), Total2 = c(34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L,
34L, 32L, 32L, 32L, 32L, 32L, 34L, 34L, 34L, 34L, 34L, 33L,
33L, 33L, 33L, 33L, 32L, 32L, 32L, 32L, 32L, 16L, 16L, 16L,
16L, 16L, 30L, 30L, 30L, 30L, 30L, 34L, 34L, 34L, 34L, 34L,
34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 31L, 31L,
31L, 31L, 31L, 34L, 34L, 34L, 34L, 34L, 22L, 22L, 22L, 22L,
22L, 16L, 16L, 16L, 16L, 16L, 34L, 34L, 34L, 34L, 34L, 34L,
34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L,
34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L,
34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L,
34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 33L, 33L, 33L, 33L,
33L, 34L, 34L, 34L, 34L, 34L, 33L, 33L, 33L, 33L, 33L, 28L,
28L, 28L, 28L, 28L, 34L, 34L, 34L, 34L, 34L, 34L)), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L,
29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L,
42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L,
55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L,
68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 1041L, 1042L, 1043L,
1044L, 1045L, 1046L, 1047L, 1048L, 1049L, 1050L, 1051L, 1052L,
1053L, 1054L, 1055L, 1056L, 1057L, 1058L, 1059L, 1060L, 1061L,
1062L, 1063L, 1064L, 1065L, 1066L, 1067L, 1068L, 1069L, 1070L,
1071L, 1072L, 1073L, 1074L, 1075L, 1076L, 1077L, 1078L, 1079L,
1080L, 1081L, 1082L, 1083L, 1084L, 1085L, 1086L, 1087L, 1088L,
1089L, 1090L, 1091L, 1092L, 1093L, 1094L, 1095L, 1096L, 1097L,
1098L, 1099L, 1100L, 1101L, 1102L, 1103L, 1104L, 1105L, 1106L,
1107L, 1108L, 1109L, 1110L, 1111L, 1112L, 1113L, 1114L, 1115L,
1116L), class = "data.frame")
An option is to group by 'Class', 'Site', paste (str_c) the columns except 'Time' to a single string, then grouped by 'Class', 'Count1', ..., 'Total2', columns, get the group indices to create the 'ind' column and do a left_join with original dataset
library(dplyr)
library(stringr)
df %>%
group_by(Class, Site) %>%
summarise_at(vars(-Time), str_c, collapse="") %>%
group_by(Class, Count1, Total1, Count2, Total2) %>%
mutate(ind = group_indices()) %>%
ungroup %>%
select(Class, Site, ind) %>%
left_join(df)
Or a similar logic with data.table
library(data.table)
setDT(df)[df[, lapply(.SD, paste, collapse=""),
.(Class, Site), .SDcols = patterns('Count|Total')][,
ind := .GRP, by = c('Class', 'Count1', 'Total1', 'Count2', 'Total2')
][, .(Class, Site, ind)], on = .(Class, Site)]

Box -Plot for Groups in R

I am having trouble to make a box-plot for differet groups side by side.
dput(df)
structure(list(UserName = structure(c(20L, 20L, 20L, 20L, 20L,
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L,
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L,
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 21L, 21L, 21L, 21L, 21L,
21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L,
21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L,
21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L,
21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L,
21L, 21L, 21L, 3L, 1L, 1L, 3L, 3L, 26L, 3L, 29L, 2L, 29L, 7L,
10L, 2L, 10L, 10L, 6L, 30L, 2L, 2L, 1L, 1L, 3L, 16L, 10L, 10L,
6L, 10L, 2L, 6L, 29L, 6L, 1L, 4L, 17L, 5L, 5L, 5L, 5L, 14L, 5L,
14L, 5L, 24L, 23L, 23L, 28L, 25L, 28L, 28L, 28L, 28L, 28L, 28L,
28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 31L, 31L,
4L, 27L, 27L, 27L, 12L, 12L, 12L, 12L, 19L, 19L, 22L, 12L, 11L,
11L, 11L, 9L, 22L, 12L, 15L, 22L, 22L, 22L, 11L, 9L, 11L, 12L,
11L, 18L, 18L, 22L, 22L, 18L, 18L, 19L, 22L, 22L, 19L, 19L, 22L,
19L, 11L, 19L, 15L, 22L, 19L, 19L, 9L, 19L, 19L, 9L, 18L, 12L,
18L, 22L, 8L, 13L, 13L, 13L), .Label = c("CYL", "FAL1",
"GS", "HA1", "HX", "HURRT", "KWY", "LEI", "L1",
"LIGYR", "LYC", "LJ", "LQI", "LIC", "LOK", "MDA",
"NMZ", "NGK", "OXJ", "P_PT", "P_SH", "PDI",
"PONN", "PEHMB", "TGT1", "TNS", "THOLH", "TOT",
"WAN1", "WAK", "YH"), class = "factor"), Division = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 7L, 7L, 2L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L), .Label = c("BATCH",
"BTR", "IIT", "POL", "PTC", "PTP", "PTQ", "SPL", "TM"), class = "factor"),
SpoolUsage_max = structure(c(20L, 21L, 22L, 25L, 26L, 27L,
29L, 33L, 34L, 39L, 41L, 43L, 47L, 48L, 49L, 51L, 52L, 53L,
55L, 57L, 58L, 59L, 60L, 61L, 81L, 82L, 83L, 87L, 99L, 102L,
108L, 108L, 141L, 143L, 155L, 158L, 160L, 5L, 8L, 90L, 94L,
96L, 98L, 104L, 110L, 111L, 112L, 113L, 114L, 116L, 117L,
118L, 120L, 122L, 124L, 126L, 127L, 128L, 129L, 130L, 131L,
132L, 134L, 135L, 136L, 137L, 138L, 139L, 140L, 142L, 144L,
145L, 146L, 147L, 148L, 149L, 150L, 151L, 152L, 153L, 154L,
156L, 157L, 199L, 201L, 203L, 204L, 205L, 206L, 69L, 70L,
71L, 72L, 73L, 74L, 75L, 77L, 78L, 80L, 9L, 16L, 16L, 17L,
23L, 36L, 42L, 46L, 46L, 46L, 50L, 56L, 63L, 65L, 89L, 97L,
101L, 125L, 172L, 174L, 174L, 184L, 185L, 186L, 191L, 196L,
207L, 4L, 6L, 68L, 106L, 107L, 35L, 10L, 37L, 95L, 175L,
175L, 188L, 189L, 198L, 3L, 24L, 91L, 92L, 40L, 40L, 44L,
45L, 103L, 133L, 178L, 194L, 195L, 200L, 7L, 66L, 164L, 165L,
166L, 167L, 168L, 169L, 170L, 13L, 14L, 35L, 100L, 119L,
123L, 18L, 54L, 109L, 79L, 9L, 11L, 15L, 18L, 19L, 30L, 31L,
32L, 38L, 54L, 62L, 64L, 84L, 85L, 86L, 88L, 93L, 109L, 115L,
121L, 159L, 161L, 161L, 162L, 162L, 173L, 176L, 177L, 179L,
180L, 181L, 182L, 183L, 187L, 190L, 192L, 193L, 202L, 208L,
1L, 2L, 67L, 76L, 79L, 105L, 163L, 171L, 12L, 28L, 197L), .Label = c("1,002.12",
"1,027.99", "1,207.40", "1,368.90", "1,599.16", "1,616.11",
"1,804.20", "1,804.28", "106.09", "106.49", "106.5", "110.59",
"118.37", "119.12", "122.69", "123.19", "123.3", "123.49",
"125.19", "126.54", "128.72", "128.94", "132.43", "132.51",
"132.55", "135.45", "137.26", "141.87", "142.59", "145.93",
"146.11", "146.52", "147.22", "149.04", "149.27", "151.42",
"154.7", "155.61", "155.9", "156.07", "156.23", "157.8",
"158.92", "159.41", "160.22", "162.84", "163.45", "166.11",
"166.63", "170.96", "171.19", "172.73", "173.24", "176.51",
"176.56", "176.94", "177.75", "181.23", "184.5", "190.34",
"190.7", "193.7", "197.78", "199.66", "199.95", "2,007.44",
"2,009.54", "2,030.52", "2,273.26", "2,440.88", "2,473.26",
"2,633.03", "2,663.28", "2,706.98", "2,723.36", "2,755.44",
"2,759.55", "2,821.46", "2,829.16", "2,835.27", "200.27",
"204.97", "206.63", "208.96", "212.89", "216.38", "217.45",
"232.67", "234.05", "251.6", "253.61", "258.98", "262.16",
"266.48", "266.88", "268.92", "271.27", "276.31", "279.41",
"283.22", "289.51", "292.47", "292.67", "298.71", "3,003.51",
"3,184.47", "3,885.86", "305.69", "307.59", "308.38", "309.54",
"310.48", "313.8", "313.91", "314.72", "317.51", "319.85",
"321.54", "321.57", "321.63", "322.46", "327.56", "328.57",
"331.06", "331.85", "333.85", "333.9", "333.98", "334.28",
"335.22", "335.89", "336.63", "337.3", "337.74", "339.74",
"341.78", "345.12", "345.54", "347.99", "348", "348.13",
"348.48", "348.49", "349.3", "350.18", "350.53", "353.08",
"353.74", "353.98", "354.59", "355.55", "358.47", "359.14",
"359.59", "359.98", "361.84", "362.86", "370.08", "373.83",
"376.4", "394.45", "395.48", "4,166.39", "4,667.87", "4,696.73",
"4,708.79", "4,729.34", "4,731.65", "4,757.80", "4,760.75",
"4,769.30", "415.37", "421.52", "423.58", "428.34", "487.35",
"491.12", "495.1", "495.91", "495.94", "499.07", "517.68",
"527.29", "536.62", "550.83", "572.71", "574.75", "576.42",
"605.69", "613.56", "632.1", "668.87", "669.68", "686.88",
"688.05", "762.93", "770.16", "781.07", "858.09", "858.68",
"864.56", "868.03", "874.65", "879.09", "886.68", "890.64",
"911.58", "954.76"), class = "factor")), .Names = c("UserName",
"Division", "SpoolUsage_max"), class = "data.frame", row.names = c(NA,
-223L))
I am trying to get a box-plot for each Division (each division withits own users) side by side.
I have tried the following:
library(reshape2)
library(ggplot2)
p <- ggplot(melt(df), aes(variable, value)) + geom_boxplot()
p <- p + geom_boxplot(fill = "grey80", colour = "#3366FF")
p <- p +xlab("UserName")+ylab("SpoolUsage_Max")+ggtitle("Spool Usage Analysis by Users")
p <- p +coord_flip()
p
I cannot produce with division (with its users ) each divison with a color for a side by side single box plot
Here you go:
df <- df %>% mutate(val = gsub(",", "", SpoolUsage_max) %>% as.numeric)
ggplot(df, aes(Division, val, fill=UserName)) + geom_boxplot()
May be neater if you use facet_wrap option.

How to work out the Net Promotion Score by prop.table()

############ uncoded data
x10<- structure(c(0L, 0L, 0L, 0L, 1L, 1L, 1L, 5L, 8L, 9L, 31L, 1L,
0L, 0L, 0L, 1L, 0L, 1L, 2L, 7L, 2L, 10L, 0L, 2L, 0L, 2L, 2L,
5L, 2L, 4L, 6L, 8L, 4L, 1L, 1L, 3L, 2L, 2L, 6L, 1L, 12L, 18L,
7L, 29L, 8L, 4L, 6L, 8L, 6L, 19L, 3L, 9L, 12L, 3L, 12L, 14L,
1L, 2L, 1L, 3L, 1L, 0L, 4L, 6L, 3L, 11L, 0L, 0L, 0L, 1L, 3L,
7L, 5L, 8L, 21L, 26L, 51L, 0L, 1L, 0L, 3L, 5L, 10L, 9L, 29L,
55L, 60L, 125L, 3L, 0L, 1L, 1L, 3L, 10L, 1L, 6L, 18L, 17L, 13L,
6L, 3L, 4L, 13L, 6L, 33L, 17L, 48L, 84L, 54L, 103L, 34L, 11L,
20L, 27L, 26L, 50L, 29L, 30L, 54L, 28L, 34L, 31L, 5L, 7L, 3L,
4L, 20L, 8L, 16L, 16L, 8L, 41L, 1L, 0L, 0L, 3L, 1L, 3L, 3L, 11L,
19L, 16L, 56L, 0L, 0L, 0L, 0L, 3L, 11L, 3L, 18L, 25L, 21L, 62L,
3L, 0L, 1L, 4L, 2L, 7L, 8L, 15L, 22L, 12L, 19L, 5L, 2L, 8L, 9L,
9L, 42L, 18L, 51L, 70L, 45L, 103L, 29L, 15L, 23L, 34L, 25L, 57L,
23L, 38L, 55L, 30L, 33L, 36L, 5L, 5L, 6L, 6L, 16L, 6L, 10L, 17L,
9L, 35L, 2L, 0L, 1L, 1L, 2L, 4L, 6L, 8L, 22L, 33L, 73L, 0L, 0L,
0L, 1L, 2L, 7L, 7L, 15L, 27L, 21L, 56L, 1L, 2L, 2L, 0L, 2L, 9L,
4L, 8L, 24L, 13L, 17L, 14L, 2L, 8L, 10L, 16L, 51L, 16L, 51L,
69L, 29L, 99L, 44L, 18L, 25L, 34L, 19L, 49L, 26L, 43L, 63L, 15L,
30L, 42L, 9L, 17L, 7L, 3L, 16L, 8L, 13L, 22L, 18L, 45L, 0L, 0L,
1L, 3L, 0L, 7L, 4L, 14L, 15L, 20L, 47L, 0L, 1L, 0L, 1L, 1L, 3L,
3L, 5L, 6L, 11L, 21L, 1L, 0L, 0L, 4L, 2L, 3L, 8L, 7L, 17L, 3L,
13L, 5L, 2L, 6L, 13L, 15L, 34L, 19L, 42L, 62L, 37L, 83L, 52L,
16L, 26L, 26L, 29L, 53L, 28L, 45L, 45L, 15L, 22L, 26L, 8L, 12L,
11L, 5L, 12L, 5L, 7L, 17L, 10L, 28L), .Dim = c(11L, 6L, 5L), .Dimnames = structure(list(
c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10"),
c("I've changed for work/ a new job/ gone on a work plan",
"I want a phone that doesn't offer", "I want Best Mates/ Favourites",
"I was offered or saw a better offer on another network",
"Issues with the network (poor coverage)", "Other"
), YearQuarter = c("2011-09-01", "2011-12-01", "2012-03-01",
"2012-06-01", "2012-09-01")), .Names = c("", "", "YearQuarter"
)), class = "table")
############ recoded data
x10 <- structure(c(40L, 3L, 13L, 12L, 3L, 9L, 12L, 13L, 10L, 36L, 16L,
30L, 15L, 54L, 21L, 14L, 22L, 10L, 77L, 16L, 29L, 185L, 28L,
84L, 30L, 19L, 24L, 157L, 82L, 132L, 62L, 197L, 84L, 49L, 78L,
32L, 72L, 11L, 30L, 83L, 17L, 43L, 31L, 25L, 37L, 148L, 93L,
121L, 63L, 206L, 93L, 44L, 80L, 27L, 106L, 16L, 30L, 77L, 17L,
42L, 30L, 20L, 32L, 128L, 117L, 120L, 45L, 215L, 106L, 63L, 102L,
35L, 67L, 15L, 29L, 32L, 9L, 11L, 16L, 18L, 24L, 120L, 94L, 104L,
37L, 230L, 90L, 38L, 79L, 24L), .Dim = c(3L, 6L, 5L), .Dimnames = structure(list(
c("Promoters", "Detractors", "Passive"), c("I've changed for work/ a new job/ gone on a work plan",
"I want a phone that doesn't offer", "I want Best Mates/ Favourites",
"I was offered or saw a better offer on another network",
"Issues with the network (poor coverage)", "Other"
), YearQuarter = c("2011-09-01", "2011-12-01", "2012-03-01",
"2012-06-01", "2012-09-01")), .Names = c("", "", "YearQuarter"
)), class = "table")
x10.p <- round(prop.table(x10,c(3,2)),2)*100
Hi there
The Net Promotion Score is a question which asks the consumers to rate the 'the likelihood to recommend the product or the service' on a zero to ten scale. People reported with 10 and 9 are called 'promoters', people rated 8 and 7 are seen as 'Passive', and people reported less than 6 are considered as detractors. The Net Promotion score is the difference between the percentage of 'Promoters' minus the the percentage of 'Detractors'.
I summerised and recoded the answers from the question into a table x10 from Sep 2011 to Sep 2012. The numbers are actual people counts for each group (Promoter,Detractor and Passive). Apologies for the three dimensioanl table, I am interested in the Net Promoter Score for each reason( i.e what's the percentage difference among the promoters and detractors for "I've changed for work/ a new job/ gone on a work plan" in Sep 2012.
The Net Promotion Score before I can plot it which requires a bit manipulation. I wonder if anyone knows to how do it?
Cheers
First, don't round until you've done all your calculations (otherwise you will have percentages not adding to 1)
x10.p <- prop.table(x10,c(3,2))*100
# get the total promoters
promoters <- apply(x10.p, 2:3, function(x) sum(tail(x,2)))
# and detractors
detractors <- apply(x10.p, 2:3, function(x) sum(head(x,7)))
# passive is everything else
passive <- passive <- 100 - (detractors +promoters)
# the net score
net <- promoters - detractors
net
YearQuarter
2011-09-01 2011-12-01 2012-03-01 2012-06-01 2012-09-01
I've changed for work/ a new job/ gone on a work plan 66.071429 50.00000 53.982301 59.210526 46.846847
I want a phone that doesn't offer 37.500000 52.86195 46.153846 44.117647 44.230769
I want Best Mates/ Favourites -2.857143 15.06849 6.451613 12.195122 -3.448276
I was offered or saw a better offer on another network 24.390244 20.21563 15.193370 3.013699 8.176101
Issues with the network (poor coverage) -43.333333 -39.35860 -39.502762 -46.448087 -54.061625
Other -17.391304 -18.23899 -23.841060 -19.500000 -29.078014
You want september 2012, select just that column, with drop = FALSE to ensure it is still a matrix with 1 column.
net[,'2012-09-01', drop = FALSE]
YearQuarter
2012-09-01
I've changed for work/ a new job/ gone on a work plan 46.846847
I want a phone that doesn't offer 44.230769
I want Best Mates/ Favourites -3.448276
I was offered or saw a better offer on another network 8.176101
Issues with the network (poor coverage) -54.061625
Other -29.078014

Resources