dplyr: Add variable as function of all variables in each row - r

I am trying to add a new variable in a dataframe using dplyr but I find it difficult.
The new variable should be the number of runs with length 2 (of all the variable values in each line). Using apply I would do this:
tmp$rle = apply(tmp,1,function(x) sum(rle(x)$lengths==2))
How can I perform this action using dplyr and mutate (without defining all variable names) ?
tmp <- structure(list(X1 = c(3, 1, 1, 4, 4, 1, 3, 2, 2, 2, 1, 3, 3,
2, 3, 1, 4, 2, 3, 2), X2 = c(2, 4, 2, 2, 3, 2, 1, 1, 3, 1, 3,
1, 4, 4, 4, 1, 3, 1, 2, 1), X3 = c(2, 4, 3, 3, 3, 2, 4, 3, 4,
4, 2, 3, 3, 3, 1, 3, 1, 4, 4, 2), X4 = c(1, 3, 3, 1, 1, 3, 2,
4, 4, 1, 4, 4, 1, 1, 1, 3, 1, 3, 1, 1), X5 = c(4, 2, 4, 2, 1,
4, 1, 2, 2, 4, 3, 4, 1, 1, 4, 4, 2, 4, 4, 3), X6 = c(3, 1, 4,
3, 4, 4, 4, 1, 1, 3, 4, 2, 2, 2, 3, 2, 3, 2, 2, 3), X7 = c(4,
2, 1, 1, 2, 1, 3, 3, 3, 3, 2, 2, 4, 4, 2, 4, 4, 3, 3, 4), X8 = c(1,
3, 2, 4, 2, 3, 2, 4, 1, 2, 1, 1, 2, 3, 2, 2, 2, 1, 1, 4)), .Names = c("X1",
"X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names = c(NA,
20L), class = "data.frame")

Rather than dplyr, you might consider using the purrr package which RStudio has fairly recently introduced as a complement to dplyr to, among other things, better handle vectors and lists. In your case, tmp is a numeric data frame where you want to treat each row as a vector. The code could look like:
library(purrr)
tmp <- tmp %>% by_row(..f=function(x) sum(rle(x)$lengths==2),
.to = "rle", .collate = "cols")

In dplyr:
tmp <- mutate(tmp, rle = apply(tmp, 1, function(x) sum(rle(x)$lengths==2)))
I am having a difficult time QA'ing this as I am unfamiliar with what results I should expect out of the rle function. I tried comparing results with your apply version of the code, and it seems that set.seed() is perhaps important for replicability? Am I understanding this correctly?
Here is the QA attempt I made: (original tmp should be exactly the same: I just wrapped the lines at the list() and structure() arguments.)
set.seed(1)
tmp <- structure(list(X1 = c(3, 1, 1, 4, 4, 1, 3, 2, 2, 2, 1, 3, 3, 2, 3, 1, 4, 2, 3, 2),
X2 = c(2, 4, 2, 2, 3, 2, 1, 1, 3, 1, 3, 1, 4, 4, 4, 1, 3, 1, 2, 1),
X3 = c(2, 4, 3, 3, 3, 2, 4, 3, 4, 4, 2, 3, 3, 3, 1, 3, 1, 4, 4, 2),
X4 = c(1, 3, 3, 1, 1, 3, 2, 4, 4, 1, 4, 4, 1, 1, 1, 3, 1, 3, 1, 1),
X5 = c(4, 2, 4, 2, 1, 4, 1, 2, 2, 4, 3, 4, 1, 1, 4, 4, 2, 4, 4, 3),
X6 = c(3, 1, 4, 3, 4, 4, 4, 1, 1, 3, 4, 2, 2, 2, 3, 2, 3, 2, 2, 3),
X7 = c(4, 2, 1, 1, 2, 1, 3, 3, 3, 3, 2, 2, 4, 4, 2, 4, 4, 3, 3, 4),
X8 = c(1, 3, 2, 4, 2, 3, 2, 4, 1, 2, 1, 1, 2, 3, 2, 2, 2, 1, 1, 4)),
.Names = c("X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"),
row.names = c(NA, 20L), class = "data.frame")
tmpApply <- tmp
tmpApply$rle = apply(tmp, 1, function(x) sum(rle(x)$lengths==2))
tmpDplyr <- tmp %>% mutate(rle = apply(tmp, 1, function(x) sum(rle(x)$lengths==2)))
tmpApply
tmpDplyr

Related

Frequency table of a categorical variable based on a different variable value- R

I have a dataframe with two categorical variables. Column 1 is variable 1 and column 2 is variable 2. I want to create a frequency table with the number of times Var1 status is 1, 2 and 3 when Var2 status is 1. Similarly when Var2 status is 2 & 3, I want the frequency of Var1 status- 1, 2 and 3. At the end I want to plot a histogram with Var2 status (1,2,3) on x-axis and on y-axis a frequency of Var1 statuses for each of Var2 status. Thanks for the help.
structure(list(`1` = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1), `2` = c(3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 3, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2)), row.names = c(NA, -101L), class = c("tbl_df", "tbl",
"data.frame"))
You are probably looking to plot barplot of frequency instead of histogram in your description.
library(tidyr)
# add column names for the dataframe example
names(df) <- paste0("Var", 1:2)
# group and summarise to find the number of occurrence for each paired Var2, Var1 combination
plotting_df <- df %>% group_by(Var2, Var1) %>% summarise(n=n())
# plot using the new summary data frame
ggplot(plotting_df, aes(x=factor(Var2), y=n, fill=factor(Var1))) +
geom_col(position="dodge")

converting NULL to numeric and taking the sum of lists

I have a BTO dataset, which I converted from long to wide format to prepare it for diversity measurements using the diversity function from the vegan package.
To achieve this I used this code:
diversity <- pivot_wider(bird_case, names_from = ENGLISH_NAME, values_from = HOW_MANY)
The results comes up with list elements as I converted the months into seasons with a previous code. I wish to take the sums of all the lists, so only a single sum remains in each cell. As for the NULL values I want these converted to 0.
I have tried to replace the NULL values to zero using this
diversity[diversity == "NULL"] <- 0
it won't work.
As for converting the list elements and taking the sum, I have tried aggregate to no avail.
Heres a reproducible code:
structure(list(year = c(2018, 2019, 2017, 2015, 2014, 2015, 2017,
2017, 2016, 2019, 2018, 2016, 2016, 2016, 2019, 2019, 2018, 2017,
2015, 2018, 2015, 2017, 2015, 2016, 2016, 2016, 2018, 2018, 2017,
2014, 2015, 2017, 2014, 2014, 2017, 2019, 2010, 2011, 2011, 2012,
2019, 2012, 2013, 2019, 2017, 2011, 2017, 2016, 2016, 2010),
Season = c("Winter", "Winter", "Summer", "Winter", "Winter",
"Autumn", "Autumn", "Winter", "Spring", "Autumn", "Spring",
"Winter", "Summer", "Autumn", "Summer", "Spring", "Summer",
"Spring", "Spring", "Autumn", "Summer", "Summer", "Autumn",
"Summer", "Autumn", "Winter", "Spring", "Winter", "Winter",
"Summer", "Winter", "Autumn", "Autumn", "Winter", "Spring",
"Winter", "Summer", "Spring", "Summer", "Autumn", "Winter",
"Winter", "Winter", "Spring", "Summer", "Winter", "Autumn",
"Winter", "Spring", "Winter"), POSTCODE = c("NR29 5QA", "NR29 5QA",
"NR29 5QA", "NR29 5QA", "NR29 5QA", "NR29 5QA", "NR29 5QA",
"NR29 5QA", "NR29 5QA", "NR29 5QA", "NR29 5QA", "NR29 5QA",
"NR29 5QA", "NR29 5QA", "NR29 5QA", "NR29 5QA", "NR29 5QA",
"NR29 5QA", "NR29 5QA", "NR29 5QA", "NR29 5QA", "NR15 1TS",
"NR15 1TS", "NR15 1TS", "NR15 1TS", "NR15 1TS", "NR15 1TS",
"NR15 1TS", "NR15 1TS", "NR15 1TS", "NR15 1TS", "NR15 1TS",
"NR15 1TS", "NR15 1TS", "NR15 1TS", "NR15 1TS", "PE32 1TL",
"PE32 1TL", "PE32 1TL", "PE32 1TL", "PE32 1TL", "PE32 1TL",
"PE32 1TL", "PE32 1TL", "PE32 1TL", "PE32 1TL", "PE32 1TL",
"PE32 1TL", "PE32 1TL", "PE32 1TL"), LOC_ID = c("LOC568364",
"LOC568364", "LOC568364", "LOC568364", "LOC568364", "LOC568364",
"LOC568364", "LOC568364", "LOC568364", "LOC568364", "LOC568364",
"LOC568364", "LOC568364", "LOC568364", "LOC568364", "LOC568364",
"LOC568364", "LOC568364", "LOC568364", "LOC568364", "LOC568364",
"LOC1163128", "LOC1163128", "LOC1163128", "LOC1163128", "LOC1163128",
"LOC1163128", "LOC1163128", "LOC1163128", "LOC1163128", "LOC1163128",
"LOC1163128", "LOC1163128", "LOC1163128", "LOC1163128", "LOC1163128",
"LOC569508", "LOC569508", "LOC569508", "LOC569508", "LOC569508",
"LOC569508", "LOC569508", "LOC569508", "LOC569508", "LOC569508",
"LOC569508", "LOC569508", "LOC569508", "LOC569508"), Wren = list(
c(1, 1, 1, 1, 1, 1, 1, 1), c(1, 1, 1, 1, 1, 1, 1, 1,
1, 1), c(1, 1, 1, 1), c(1, 1, 1, 1, 1, 1, 1), 1, c(1,
1, 1, 1, 1), c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), c(1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1), c(1, 1, 1, 1, 1, 1), c(1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1), c(1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1), c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), c(1, 1, 1,
1, 1, 1, 1, 1, 1, 1), c(1, 1, 1, 1, 1, 1), c(1, 1, 1,
1, 3, 1, 1, 1), c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), c(1,
1, 1, 1, 1, 1, 1, 1), c(1, 1, 1, 1, 1, 1, 2, 1, 1), c(1,
1, 1, 1, 1, 1, 1), c(1, 1, 1, 1, 2, 1), c(2, 1, 2, 3,
1, 1, 1), c(1, 1), c(1, 1, 1), c(1, 1, 1, 1, 1), c(1,
1), c(1, 1), c(1, 1, 1), c(1, 1, 1, 1, 1, 1, 1, 1, 1),
NULL, 1, c(2, 1, 1, 2, 1), NULL, NULL, c(1, 1, 1), NULL,
1, NULL, NULL, c(1, 1), c(1, 1, 1, 1, 1, 1), c(1, 1,
1, 1), c(1, 1), NULL, NULL, c(1, 1), 1, c(1, 1, 1), NULL,
c(1, 1, 1, 1)), Dunnock = list(c(2, 2, 1, 2, 1, 1, 1,
2, 2, 2), c(2, 1, 2, 2, 2, 2, 1), c(1, 1, 2, 1, 3, 1, 2),
c(1, 2, 2, 2, 2, 2, 2, 1, 1), 2, c(1, 1, 1, 2, 1, 1,
2), c(1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1), c(2, 2, 2, 2,
2, 1, 2, 2, 1, 2, 2), c(2, 5, 2, 1, 3, 2, 2, 3, 2, 3,
1), c(2, 1, 1, 2, 2), c(3, 2, 3, 2, 2, 3, 3, 2, 2), c(1,
1, 1, 1, 1, 1), c(1, 2, 1, 2, 2, 2, 2), c(1, 1, 2, 1,
2, 1, 2, 2, 1, 2), c(3, 4, 2, 5, 3, 5, 4, 2), c(2, 2,
2, 2, 1, 2, 3, 2, 2), c(3, 3, 3, 3, 3, 2, 1, 2, 3, 1),
c(2, 3, 2, 2, 2, 2, 2, 2, 2, 5, 4, 2), c(2, 2, 2, 1,
2, 1, 2, 2, 2, 2), c(1, 2, 1, 1, 2, 2, 1), c(1, 1, 1,
2, 1, 1), c(3, 4, 6, 3, 3, 3), c(1, 1, 2, 1), c(2, 1),
c(2, 2, 1, 2, 1), c(2, 1, 2, 1, 2, 2), c(2, 2), c(1,
1, 1, 2), c(2, 3, 2, 2, 2, 3, 3, 2, 2, 2), 2, c(2, 2,
3, 2, 2), c(2, 1, 2, 2, 2, 2), 1, NULL, c(3, 2), c(1,
1), c(1, 2, 1, 1, 1, 1, 1, 2), c(2, 2, 1, 1, 1, 1), c(3,
3, 2, 1, 2, 2, 2, 1, 1, 1), c(2, 1, 1, 1, 1), c(1, 1,
1, 1, 1, 2, 2, 2), c(1, 1, 2, 1, 1, 2, 1, 1), c(1, 1,
2, 1, 1), c(3, 2, 1, 5, 2, 1, 2, 2, 2), c(3, 3, 1, 1,
1, 3, 2, 1), c(1, 1, 2, 1), c(1, 2, 2, 2, 1, 2, 1, 2,
2, 2, 1, 1), c(1, 1, 2, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1),
c(1, 1, 2, 1, 4, 4, 1, 2, 2, 2), c(2, 1, 1, 4, 1, 1,
1, 2, 1, 1)), `Blue Tit` = list(c(1, 1, 2, 3), c(2, 2,
3, 2, 2), 4, c(4, 2, 3, 4), 2, c(2, 2), c(1, 2, 2), c(2,
2), c(2, 2, 1, 2, 2), NULL, c(2, 2, 2, 5, 2, 2, 2, 2, 2),
c(2, 1, 2, 2, 3, 2), 2, NULL, 7, c(2, 2, 2, 2, 2, 2,
2, 2, 2), NULL, c(1, 1, 2, 2, 2, 2, 2), c(4, 2, 4, 3,
7, 3, 2), 1, c(2, 2, 3), c(8, 10, 10, 12, 10, 8, 5, 12
), c(6, 4, 4, 6, 4), c(12, 6, 6, 6, 6), c(4, 4, 5, 5,
8), c(10, 6, 6, 4, 6, 6, 4), 4, c(10, 4, 4, 8, 6), c(4,
6, 4, 10, 6, 6, 8, 7, 6), c(12, 12, 6), c(12, 8, 12,
12, 12, 10, 10), c(10, 5, 10, 5, 10), c(12, 12, 6), c(6,
6), c(4, 2, 2, 2), c(2, 6), c(3, 2, 2, 1, 2, 1, 2), c(2,
2, 2, 1, 2, 1), c(2, 4, 1, 2, 1, 2, 2, 1, 2), c(4, 3,
1, 2, 2, 2, 2, 3, 5, 4), c(2, 4, 3, 3, 1, 2, 2), c(2,
4, 2, 2, 1, 2, 1, 1, 3), c(3, 3, 2, 2, 3, 2, 3, 2), c(1,
2, 1, 2, 2, 2, 2, 1, 2), c(5, 3, 9, 4, 4, 3, 9, 5), c(1,
2, 1, 2, 3, 2, 1, 2, 3, 3, 2), c(4, 3, 5, 2, 3, 4, 3,
3, 4, 5, 2), c(3, 3, 3, 3, 4, 2, 3, 4, 3, 5, 3), c(2,
2, 2, 1, 1, 2, 1, 2, 1, 2, 4), c(2, 2, 2, 3, 2, 2, 2,
1)), `Pied/White Wagtail` = list(c(1, 2, 2, 2, 1, 1,
1, 2, 2, 2), c(2, 1, 1, 1, 2, 1, 1, 2, 2), c(1, 1, 1, 1),
NULL, NULL, NULL, 2, c(2, 2, 2, 2), c(1, 1, 1, 1, 1),
c(2, 2, 2, 1), c(2, 2, 2, 2, 2, 2, 2, 2, 2), NULL, c(2,
2), NULL, c(2, 2, 1, 2, 2, 2), c(2, 2, 2, 2, 2, 2, 1,
2, 2, 2), c(2, 2, 2, 2, 1, 2, 3), c(1, 2, 2, 2, 2, 2),
NULL, c(1, 1), 1, 1, NULL, NULL, 1, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
1, NULL, 1, c(1, 1, 1, 1, 1), c(2, 1, 2, 1), c(1, 1),
c(1, 1), 1, 1, 1, 1, c(1, 1), c(1, 1, 1, 1)), `Collared Dove` = list(
c(2, 2, 2, 2, 2, 2, 2, 2, 2), c(2, 2, 3, 2, 2, 2, 2,
3, 2, 2), c(2, 3, 2, 2, 2, 2, 2, 3, 3), c(1, 1, 2, 2),
NULL, c(2, 2, 2, 2, 2), c(2, 2, 2, 2, 2, 2, 2, 2, 2,
3), c(2, 2, 2, 1, 2, 2, 2, 1, 1), c(2, 2, 2, 2, 2, 2,
2), c(2, 2, 4, 4, 2, 2, 22, 2), c(2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2), c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2),
c(2, 2, 2, 2, 2, 2, 2, 2, 2), c(2, 2, 2, 2, 2, 2, 2,
2), c(2, 3, 3, 4, 2, 2, 2, 2, 2), c(2, 3, 3, 3, 2, 3,
3, 3, 3, 3, 2), c(2, 1, 2, 2, 2, 3, 2, 2, 2, 2, 2, 3),
c(2, 2, 2, 2, 2, 2), c(1, 2, 2, 1, 2), c(2, 2, 2, 2,
2, 2, 2, 2, 2), c(1, 2, 2, 1, 2), c(2, 2, 2, 2, 2, 2,
1), c(1, 1), c(1, 1), c(1, 2, 2, 2, 1), c(2, 2, 1, 2),
2, c(2, 1), c(3, 1, 1, 1, 1, 2, 2), NULL, c(2, 1, 1),
c(2, 2), 1, 1, c(2, 2), NULL, c(9, 9, 17, 8, 19), c(6,
3, 2, 3, 3, 5, 3), c(16, 9, 12, 3, 7, 5), c(4, 4, 3,
3, 5, 3, 2), c(2, 2, 3, 3, 2, 3, 4, 2), c(2, 2, 2, 3,
4, 4, 2, 12, 3, 5, 4), c(2, 2, 3, 3, 2, 3, 2, 3, 3),
c(3, 3, 3, 3, 2, 5, 3, 1, 3), c(4, 2, 3, 2, 7, 2, 3),
c(3, 1, 12, 3, 4, 4, 2, 5, 5, 12), c(3, 2, 1, 5, 3, 2,
2, 1, 2, 3, 2), c(3, 2, 2, 5, 3, 3, 2, 2, 10), c(2, 2,
1, 1, 3, 2, 1, 1, 2), c(6, 2, 6, 2, 5, 3, 2, 2, 4, 11,
3, 2)), `Great Tit` = list(c(1, 2, 1, 1, 1, 1, 1, 1),
c(1, 2, 1, 2, 1, 2, 1, 1), NULL, c(1, 3, 2, 5, 3, 3,
4, 1), NULL, c(1, 2, 1, 1), c(1, 1), NULL, c(1, 1, 1,
2, 1, 1), 1, c(1, 1), c(1, 1, 1, 1), 1, NULL, c(2, 2,
1, 1), c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2), 1, 3, c(2, 2,
2, 1), c(4, 2, 1, 2), c(2, 1), c(8, 8, 12, 6, 8), c(2,
2, 3, 2), c(8, 3, 6, 4, 6), c(2, 2, 4, 2), c(1, 1, 2),
c(2, 2, 2), c(1, 2, 1, 2, 1, 2), c(2, 2, 2, 2, 2, 2,
2), c(4, 4, 6), c(2, 4, 2, 2, 4, 2, 2), c(3, 4, 2, 2,
3), 6, c(2, 2), c(1, 2), 2, c(2, 1, 1, 1, 1, 2), c(1,
1, 2, 2, 1, 2, 1), c(1, 1, 1, 2, 1, 2, 1, 2), c(2, 2,
3, 1, 2, 4, 1, 3), c(3, 1, 1, 2), c(1, 2, 2, 1, 1, 2,
2, 2, 1, 2), c(2, 1, 2, 1, 1, 1), c(2, 1, 2, 1), c(2,
3, 2, 3, 2, 1), c(1, 1, 1, 2, 2, 2, 1), c(1, 2, 2, 1,
1, 2, 3, 1, 2, 1, 3, 3), c(2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2), c(2, 2, 2, 8, 1, 2, 2, 2, 1, 2, 2, 2), c(2, 1,
1, 1, 1, 1, 1, 2)), Robin = list(c(1, 1, 3, 1, 3, 1,
1, 3, 3, 2), c(2, 2, 1, 2, 2), c(1, 2, 1, 1, 1, 1, 1), c(1,
1, 2, 1, 1, 1, 1, 1, 1, 2), 1, c(1, 1, 1, 1, 2, 1, 1), c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1), c(1, 1, 3, 1, 1, 1, 1, 1, 1,
1, 1, 2, 1), c(2, 2, 1, 1, 3, 1, 1, 2, 2, 2), c(2, 2, 1,
2, 2, 2, 2), c(1, 2, 2, 3, 1, 2, 2, 3), c(1, 1, 2, 1, 1,
1, 1, 1, 1, 1, 1, 2, 1), c(2, 1, 1, 1, 2, 1), c(1, 1, 1,
1, 1, 1, 1, 2, 1, 1), c(1, 5, 1, 1, 2, 2, 2, 1), c(2, 2,
2, 1, 2, 2, 2, 2, 2), c(2, 2, 2, 1, 3, 3, 2), c(2, 2, 6,
1, 1, 2), c(1, 2, 1, 1, 2, 2, 2, 2, 2), c(1, 1, 1, 1, 2,
2, 2, 2, 1), c(1, 1, 1, 1, 2, 1, 1), c(1, 3, 3, 3), c(3,
1, 1, 2), c(1, 1, 1, 1, 1), c(2, 2, 2, 2, 2, 3, 2), c(2,
1, 2, 1, 3), c(2, 2), c(3, 1, 3, 5, 2, 2, 2, 2, 2), c(3,
4, 4, 2, 3, 2, 2, 4, 2), 1, c(4, 2, 4, 2, 4), c(1, 1, 3),
c(2, 2, 2), 2, c(3, 2, 2), c(1, 2), c(1, 1, 1, 1, 1),
c(3, 2, 2, 2, 4, 2, 2, 1), c(2, 1, 2, 1, 1), c(1, 1,
1, 1), c(1, 1, 1, 1, 1, 1, 1, 1, 1), c(1, 1, 2, 1, 1),
c(1, 1, 1, 1, 1), c(1, 1, 1, 1, 1, 1, 1), c(2, 4, 3,
2, 1, 6, 2, 3, 1, 2), c(1, 1, 2, 1, 1, 1, 2, 1, 2, 1),
c(1, 1, 1, 1, 1, 2, 1, 1, 1), c(1, 1, 1, 2, 1, 1, 1,
1, 1), c(2, 1, 2, 1, 2, 1, 2, 1), c(1, 1, 1, 1, 2, 1,
1, 1, 2)), Greenfinch = list(2, c(2, 2, 2, 2, 2, 2, 2
), 1, c(1, 1, 2), NULL, NULL, NULL, c(2, 2, 2), c(3, 1, 2,
3, 3), 2, c(2, 5, 2, 2, 2, 2, 2, 5, 2, 2), NULL, c(2, 1,
3, 2), NULL, c(1, 2, 1, 2, 1, 1), c(2, 2), 2, 1, c(2, 2,
2, 1, 2, 2, 1, 2, 2, 2), NULL, c(3, 1, 3), c(4, 2, 4), 1,
c(2, 2, 4, 3, 2, 2), c(2, 2, 1, 2, 4, 2, 2, 2), c(2,
2, 3, 2, 3), c(3, 1), c(2, 2, 2, 2, 3), c(2, 6, 4, 2,
2, 2), 4, c(5, 5, 5, 5), c(2, 2, 1, 4, 2, 4, 4), 4, c(2,
2), c(4, 1, 4), 2, c(7, 2, 3, 2, 2, 3, 4, 4, 3), c(4,
3, 2, 1, 2, 2, 2), c(6, 1, 3, 2, 1, 2, 2), c(3, 1, 2,
3), 1, c(1, 1, 3, 3, 1, 5, 2, 1, 1, 3, 1), c(1, 2, 2,
2, 3, 1, 3), c(1, 1, 3, 1, 1, 3, 1), c(1, 4, 1, 3, 4),
c(2, 2, 1, 1, 1), c(2, 2, 5, 2, 1, 2, 1, 1), c(7, 2,
6, 1, 2), c(2, 1, 2, 1, 1), c(4, 2, 1, 1, 2, 1)), `House Sparrow` = list(
NULL, c(2, 2, 2, 2, 2, 2), NULL, c(2, 2, 4, 6, 3, 4,
3, 3), 3, c(3, 2, 2, 2), NULL, NULL, NULL, c(2, 2, 2),
c(1, 2, 2), c(2, 2), NULL, NULL, c(3, 5), c(2, 2, 2,
2, 2, 2, 2, 2), NULL, NULL, c(3, 3, 3, 3, 2, 3, 5, 3,
3, 3), NULL, c(2, 2, 1, 1, 1, 2), NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, c(20, 14, 12, 10, 8, 14, 21), c(6, 5, 9,
9, 9, 6), c(13, 12, 5, 21, 11, 12, 16, 10, 15), c(3,
2, 7, 3, 1), c(10, 11, 15, 8, 12, 15, 5, 16), c(1, 5,
5, 5, 5, 4, 5, 2, 6, 4, 4), c(2, 4, 1, 4, 3, 3, 7, 7,
3, 5, 3), c(9, 10, 10, 7, 8, 10, 10, 6, 8, 6, 12, 9),
c(10, 5, 13, 14, 4, 5, 9, 9, 10, 8, 9), c(10, 9, 10,
7, 9, 10, 8, 7, 9, 14), c(3, 7, 5, 10, 2, 6, 14, 6, 3,
7, 3), c(7, 9, 11, 5, 5, 7, 7, 6, 6, 10, 5, 7, 16), c(5,
7, 5, 5, 6, 8, 7, 4, 5), c(15, 10, 12, 9, 3, 9, 10, 11
)), `Coal Tit` = list(1, c(1, 1, 1), NULL, c(2, 3, 2,
2, 4, 2, 2), NULL, 2, NULL, NULL, c(1, 1, 1), NULL, c(2,
1), 1, 1, NULL, 2, NULL, 1, NULL, 2, 1, 2, c(1, 1, 1), c(2,
2, 2, 2, 2, 1), c(1, 1, 1, 1, 1), c(1, 1, 1, 1), c(1, 1,
2), 1, c(2, 1, 1, 1, 1, 1), c(2, 2, 1, 2, 1, 2, 2), c(2,
2), c(2, 2, 2), c(2, 2, 2, 2, 2, 2, 2, 2), c(2, 2, 2), 2,
c(1, 1), NULL, 1, NULL, c(1, 1, 1), c(2, 1, 1, 2, 2,
1, 1, 3, 1), c(1, 1, 1, 1), c(2, 1, 1, 1, 1, 2), c(1,
1, 1, 2, 1, 1), NULL, c(2, 1, 1, 1), c(1, 1), c(1, 2,
1, 1, 2, 1, 2, 1, 2), c(1, 2, 1, 2, 1, 1, 1), c(1, 1,
1, 1, 1, 1, 1, 1, 1), 1), Woodpigeon = list(c(2, 3, 3,
3, 3, 3, 3, 5, 3), c(3, 4, 3, 3, 3, 5, 3, 3, 2, 4, 3), c(2,
1, 3, 3, 3, 3, 1, 3), c(3, 3, 3, 4, 1, 1, 5, 5, 5), 2, c(3,
4, 1, 3, 3, 3, 1, 5, 3), c(2, 6, 5, 3, 7, 5, 2, 1, 3, 2,
2), c(3, 3, 3, 3, 2, 2, 3, 4, 3, 3, 5, 5, 1), c(5, 5, 5,
3, 5, 4, 4, 5, 7), c(5, 4, 3, 4, 5, 4), c(3, 3, 3, 3, 3,
3, 5), c(3, 2, 3, 3, 3, 5, 6, 3, 3, 5, 5), c(5, 3, 2, 5,
3, 5, 3, 3, 3), c(2, 3, 3, 3, 4, 5, 3, 5, 5), c(3, 3, 3,
3, 5, 3, 3, 2, 3), c(3, 3, 3, 3, 3, 3, 3, 2, 3, 3, 2), c(3,
2, 3, 3, 3, 3, 3, 3, 3, 3, 3), c(3, 4, 3, 3, 5, 3, 3, 3),
c(3, 3, 4, 5, 5, 3, 3, 3), c(3, 3, 3, 3, 2, 3, 3, 3),
c(2, 2, 3, 3, 3, 2, 3, 3), c(4, 4, 4, 4), c(10, 8, 10
), c(5, 5, 5), c(6, 4, 6), c(6, 6, 6, 10, 6), c(6, 10
), c(20, 10, 10, 10, 4, 10, 8), c(6, 6, 4, 4, 6, 4, 4,
6, 6), NULL, c(8, 8, 8), c(3, 4, 4, 6, 3, 6, 3), NULL,
NULL, c(6, 4, 6, 4), 1, c(3, 3, 2, 3, 1, 3, 2, 1), c(5,
3, 4, 4, 2, 3, 3), c(1, 2, 5, 1, 4, 4, 3, 4, 2, 5, 2),
c(3, 2, 2, 1, 3, 2, 2, 1), c(5, 6, 2, 6, 2), c(1, 6,
2, 6, 2, 3, 3, 3), c(5, 3, 5, 4, 4, 2, 2, 5), c(5, 5,
3, 4, 2, 3, 5, 4), c(3, 2, 2, 2, 3, 2, 5), c(2, 2, 5,
3, 3, 5, 3, 4), c(2, 2, 1, 1, 5, 6, 2, 7, 5, 2, 3), c(5,
2, 3, 5, 2, 1, 5, 6, 4, 2), c(2, 3, 4, 3, 3, 4, 3, 3,
3, 3), c(7, 5, 3, 2, 5, 9, 2, 3, 3, 4, 3)), Blackbird = list(
c(3, 3, 1, 3, 3, 3, 3, 5, 5), c(3, 3, 3, 3, 3, 3, 5,
3, 3, 3), c(2, 1, 3, 3, 3, 3, 3), c(5, 5, 11, 7, 3, 11,
15, 10, 5, 3), NULL, c(7, 2, 9, 3, 6, 3, 2, 3, 5), c(5,
2, 3, 1, 3, 5, 2, 1), c(3, 3, 4, 1, 2, 3, 3, 2, 3, 4,
2), c(4, 3, 3, 5, 4, 5, 5, 4, 3, 3, 5, 3), c(11, 7, 5,
4, 11, 11, 5), c(2, 4, 2, 3, 5, 6, 3, 3), c(3, 3, 3,
3, 4, 4, 3, 3), c(3, 3, 2, 2, 2, 3, 4, 3, 2), c(5, 13,
3, 5, 7, 4, 3, 7), c(5, 8, 6, 5, 5, 6, 3, 5, 10), c(4,
3, 8, 4, 3, 6, 3), c(5, 5, 5, 2, 5, 3, 3, 3), c(3, 3,
3, 3, 5, 5, 4, 4, 5, 4, 3), c(3, 3, 4, 5, 4, 5, 5, 2),
c(5, 5, 1, 3, 3, 5, 5, 1), c(5, 1, 3, 5, 2, 3, 3), c(2,
3, 3, 3, 2, 2, 3), c(1, 2, 2), c(2, 2, 2), c(3, 6, 4,
2, 4), c(3, 3), c(2, 3), c(1, 4, 4, 2, 3, 5), c(6, 6,
6, 6, 6, 4, 6), 2, c(4, 2, 4, 4, 2, 2), c(1, 3, 3, 1,
2, 1, 3, 3, 2), 2, 3, c(4, 4, 6, 4), 2, c(2, 2, 5, 6,
4, 8), c(4, 3, 5, 5, 5), c(4, 4, 1, 4, 3, 6, 4, 5, 7),
c(6, 2, 5, 3, 1, 3, 1), c(3, 4, 3, 4, 2, 5, 3, 3, 5),
c(6, 7, 8, 7, 3, 8, 5, 10, 4, 5), c(6, 13, 3, 6, 8, 6,
14, 4, 5, 2, 4, 2), c(8, 8, 6, 6, 2, 2, 3, 5, 5), c(7,
4, 7, 4, 4, 6, 4, 4, 4), c(6, 7, 5, 7, 6, 8, 4, 7, 6,
11), c(2, 3, 3, 4, 2, 5, 3, 2, 3, 2), c(4, 3, 2, 3, 3,
3, 4, 3, 2, 4), c(3, 8, 7, 7, 4, 6, 4, 7, 3, 3), c(4,
9, 7, 6, 3, 2, 6, 3, 5)), `Song Thrush` = list(c(1, 1,
1, 1, 1, 1, 1, 2, 1, 1), c(1, 1, 1, 1, 1), c(1, 1, 1, 1,
1, 1, 1, 2), c(1, 1, 1, 1, 1, 1), 1, c(1, 1, 11), c(1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1), c(1, 1, 1, 1, 1, 1, 1, 1, 1,
1), c(1, 1, 1, 1), 1, c(1, 1, 1, 1, 1, 1, 1), c(1, 2, 1,
1, 1, 1, 1, 1), c(1, 1, 1, 1, 2, 1, 1, 1), c(1, 1, 1, 1,
1, 1, 1, 1, 1), c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), c(1,
1, 1, 1, 1, 1), c(2, 1, 1, 1, 2, 2, 2, 1, 1, 1), c(1, 1,
1, 1, 1, 1, 1, 1), c(1, 1, 1, 1, 1, 1, 1), c(1, 1, 1, 1,
1, 1), c(2, 1, 1, 1, 1, 1), c(2, 2, 1, 1), NULL, 1, 1, c(1,
1, 1), 1, c(1, 1, 2, 1), c(1, 2, 1, 2, 3, 1), NULL, c(1,
1), NULL, NULL, NULL, c(1, 2, 2), NULL, 1, c(1, 1), NULL,
NULL, NULL, NULL, NULL, NULL, 1, c(1, 1), 1, 1, c(1,
1, 2, 1, 2, 1, 1, 1, 1), NULL), Chaffinch = list(c(2,
1, 3, 3), c(2, 2, 2, 1, 2, 1), c(1, 2), c(1, 1, 3, 2, 2,
2, 2, 2, 1, 2), NULL, c(3, 3, 3, 2, 2, 2, 1), c(3, 1, 3,
2), NULL, c(5, 2, 2, 2, 4), c(2, 2), c(2, 2, 2, 2, 2, 3,
2, 2, 2, 2), c(5, 1, 3, 2, 3), c(2, 1, 1, 3), c(3, 2, 2,
2, 2, 3, 1), c(2, 2), c(2, 2, 2, 2, 2, 2, 2, 2, 2), c(2,
2, 3), c(2, 1, 2, 2, 2), c(4, 3, 2, 5, 2, 2), c(1, 3), c(3,
3, 1, 5, 1), c(2, 4, 5, 2, 2), c(2, 4, 2, 4, 1), c(6, 4,
4), c(5, 2, 4, 4, 5, 4, 4), c(6, 3, 4, 5, 4, 4, 3), c(4,
4, 4), c(4, 2, 6, 2), c(7, 6, 8, 8, 8, 4, 6, 4, 4, 4), c(10,
6), c(2, 6, 6, 4), c(4, 4, 5, 4, 4, 4, 5), c(10, 10, 10),
NULL, c(2, 2, 2, 4, 4, 4), NULL, c(6, 6, 5, 7, 3, 2),
c(4, 4, 2, 3, 10), c(1, 5, 3, 5, 4, 5, 3, 2, 4, 2), c(5,
4, 4, 2, 7, 6, 10, 2, 7, 2), c(2, 4, 2, 3, 4, 1, 4, 3,
1, 1), c(13, 7, 3, 6, 13, 9, 5, 7, 7, 11), c(10, 7, 9,
7, 9, 17, 11, 8, 4), c(1, 3, 3), c(1, 3, 4, 1, 1, 1,
2, 6, 4), c(5, 8, 6, 9, 9, 3, 11, 2, 5), c(2, 3, 3, 3,
2, 3), c(4, 3, 3, 5, 3, 4, 4, 4, 6, 3, 3, 3), c(3, 2,
3, 2, 3, 2, 4, 3, 2, 1, 2, 5, 3), c(12, 5, 12, 8, 18,
6, 3, 4, 9, 15, 7, 10)), Starling = list(c(1, 3), 1,
3, c(5, 5, 5, 5, 7, 7, 5), NULL, NULL, NULL, NULL, c(5,
9, 7, 5, 7, 7), NULL, c(1, 1, 2, 2, 1, 2), 3, NULL, NULL,
1, c(1, 3), c(1, 1), c(2, 2, 2, 2), c(5, 2, 1, 3, 7,
13, 1, 2, 2, 3), c(1, 2), NULL, NULL, NULL, NULL, NULL,
NULL, c(1, 1), NULL, 4, NULL, NULL, NULL, NULL, NULL,
c(4, 12), NULL, c(2, 28, 9, 2, 3, 9), c(3, 7, 8, 2, 3,
12, 3), c(2, 1, 6, 9, 18), c(11, 1, 5, 30, 10), c(25,
9, 8, 39, 20, 18, 30), c(15, 10, 9, 27, 14, 15, 30, 30,
19, 12), c(3, 8, 14, 2, 21, 19, 35), c(13, 8, 9, 21,
9, 28, 1, 5, 16), c(1, 2, 2, 1, 1, 8, 1), c(6, 27, 6,
25, 16, 10, 3, 40, 5, 30), c(2, 1, 3, 2, 3, 2, 1), c(6,
4, 24, 6, 8, 7, 9, 10), c(17, 3, 1, 11, 5, 5, 2, 6, 6,
5, 2, 3), c(2, 4, 1, 5, 3, 3, 14, 7, 5, 2, 6)), Goldfinch = list(
c(1, 3, 5, 1, 1), NULL, 2, NULL, NULL, NULL, c(2, 2,
2), c(3, 3, 3), c(2, 1), NULL, c(2, 3, 3, 2, 2, 3, 2),
NULL, NULL, NULL, NULL, NULL, NULL, NULL, c(2, 2, 2),
c(5, 3), 2, c(6, 10, 6, 6, 6, 8, 4), c(2, 6, 3), c(6,
4, 2), c(10, 10, 8, 8, 10, 10), c(1, 6, 6, 6, 1, 2, 6
), c(2, 2), c(2, 2, 4, 6), c(1, 4, 4, 4, 4, 8, 4, 6),
c(10, 8, 8), c(6, 6, 6, 2, 1), c(7, 8, 5, 8, 4), 10,
4, c(4, 3, 6, 4), 3, c(3, 5, 4, 2, 2, 6, 3), c(2, 4,
7, 6, 6, 6), c(10, 4, 6, 4, 5, 5, 5, 6), c(11, 15, 12,
9, 15, 8, 25), c(2, 1, 1, 1, 1, 1, 2, 1), c(23, 24, 12,
14, 20, 17, 13, 6, 18), c(18, 13, 19, 42, 10, 12, 21,
27, 7, 7), c(2, 2, 2, 1, 4), c(1, 5, 1, 7, 3, 3), c(6,
6, 18, 8, 6, 14, 16, 3, 7, 5, 4), c(8, 3, 1, 2, 2, 1,
1, 3, 1), c(1, 1, 1, 2, 6, 2), c(1, 1, 2, 2, 2, 1, 1),
c(12, 3, 6, 9, 9, 4)), Brambling = list(c(2, 2), NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, c(2,
2, 2, 2), NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, c(1, 2, 2, 1), NULL, NULL, 1,
NULL, NULL, NULL, 1), Blackcap = list(NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, 1, NULL, c(1, 2, 2), NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
1, NULL, c(1, 1, 1), NULL, NULL, 1, 1, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, 1, NULL, NULL, NULL, NULL, NULL,
NULL, 1, NULL, NULL, NULL, NULL, NULL, 1, NULL), Jackdaw = list(
2, c(1, 2), NULL, NULL, NULL, NULL, NULL, NULL, c(1,
1, 1, 2), NULL, c(1, 4, 1, 1, 4, 1, 1), 1, NULL, NULL,
c(6, 5, 5, 5, 5, 5), c(4, 2, 4, 1, 5, 1, 5, 1), c(7,
2, 5), c(1, 1, 1, 1), 1, NULL, NULL, NULL, NULL, c(4,
4), NULL, 2, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, c(2, 1), NULL, c(3, 5, 1, 2, 2), c(3, 2, 1, 1,
2, 4, 3, 3), c(3, 3, 6, 2, 9, 4, 10, 3, 2), c(6, 1, 1,
3, 4, 2, 3, 1), c(5, 3, 5, 4, 5, 4, 4), c(3, 2, 6, 5,
2, 3, 1, 3, 3, 4), c(6, 3, 2, 6, 2, 2, 3, 3, 3, 5, 5,
3), c(6, 5, 6, 5, 5, 8, 5, 4, 7, 6), c(5, 2, 3, 5, 4,
3, 3, 5, 2), c(3, 1, 2, 4, 2, 3, 1, 2), c(3, 5, 9, 4,
3, 5, 5, 5, 6, 5, 4, 5, 5), c(5, 1, 8, 6, 5, 6, 3, 3,
8, 6, 4), c(7, 6, 6, 6, 6, 5, 4, 3), c(3, 4, 2, 4, 2,
2, 2, 7, 11, 3, 6)), Siskin = list(NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, c(2, 2, 2),
NULL, NULL, NULL, NULL, NULL, NULL, NULL, c(1, 1), NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, c(2, 3), c(4,
2, 1, 2, 1), NULL, NULL, NULL, NULL, NULL, NULL, NULL,
1, c(2, 2, 2), 1, NULL, c(3, 3), 2, c(4, 1, 2, 3), c(1,
4, 2), c(1, 1, 1, 1, 2), c(3, 3), NULL, c(2, 2, 1), c(5,
1, 2, 2, 2, 2, 2, 2), NULL), `Spotted Flycatcher` = list(
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, 2, NULL, 1, NULL, NULL, NULL, NULL, NULL, c(1,
1), NULL, c(1, 1, 1), NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL)), row.names = c(NA, -50L
), class = c("tbl_df", "tbl", "data.frame"))
Is this what you need?
library(dplyr)
library(purrr) # map_dbl
group_by(zz, year, Season, POSTCODE, LOC_ID) %>%
summarize_all(~ map_dbl(., sum, na.rm = TRUE)) %>%
ungroup()
# # A tibble: 50 x 25
# year Season POSTCODE LOC_ID Wren Dunnock `Blue Tit` `Pied/White Wag~ `Collared Dove` `Great Tit` Robin Greenfinch
# <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2010 Summer PE32 1TL LOC56~ 1 10 13 0 62 8 5 30
# 2 2010 Winter PE32 1TL LOC56~ 4 15 16 4 48 10 11 11
# 3 2011 Spring PE32 1TL LOC56~ 0 8 10 1 25 10 18 16
# 4 2011 Summer PE32 1TL LOC56~ 0 18 17 0 52 11 7 17
# 5 2011 Winter PE32 1TL LOC56~ 2 5 22 1 51 10 13 7
# 6 2012 Autumn PE32 1TL LOC56~ 2 6 28 1 24 18 4 9
# 7 2012 Winter PE32 1TL LOC56~ 4 10 18 6 43 16 6 22
# 8 2013 Winter PE32 1TL LOC56~ 2 6 20 2 23 8 5 14
# 9 2014 Autumn NR15 1TS LOC11~ 0 1 30 0 1 6 6 4
# 10 2014 Summer NR15 1TS LOC11~ 0 2 30 0 0 14 1 4
# # ... with 40 more rows, and 13 more variables: `House Sparrow` <dbl>, `Coal Tit` <dbl>, Woodpigeon <dbl>,
# # Blackbird <dbl>, `Song Thrush` <dbl>, Chaffinch <dbl>, Starling <dbl>, Goldfinch <dbl>, Brambling <dbl>,
# # Blackcap <dbl>, Jackdaw <dbl>, Siskin <dbl>, `Spotted Flycatcher` <dbl>
(You can do it without purrr::map_dbl, just use sapply in its place.)
We can use summarise with across
library(dplyr)
library(purrr)
zz %>%
group_by(year, Season, POSTCODE, LOC_ID) %>%
summarise(across(everything(), ~ map_dbl(., sum, na.rm = TRUE)))

Plotly does not plot a non-quadratic surface

I want to produce a 3D scatterplot and add a surface fitted with a linear regression, using plotly. My data:
structure(list(political_trust = c(1, 6, 7, 5, 0, 2, 1, 3, 5,
0, 2, 5, 5, 6, 6, 3, 3, 2, 5, 8, 3, 7, 3, 4, 5, 4, 5, 0, 0, 4,
6, 1, 0, 4, 0, 5, 5, 6, 7, 3, 5, 4, 5, 2, 4, 4, 7, 6, 7, 5, 4,
6, 7, 5, 7, 3, 3, 3, 2, 5, 2, 7, 3, 2, 7, 2, 3, 0, 7, 5, 7, 3,
0, 7, 2, 6, 3, 8, 7, 2, 2, 5, 0, 1, 6, 3, 6, 5, 1, 3, 4, 4, 5,
3, 3, 0, 2, 4, 9, 6, 3, 3, 2, 3, 4, 5, 8, 0, 4, 1, 5, 0, 4, 0,
5, 6, 3, 2, 7, 5, 4, 3, 8, 3, 4, 0, 3, 6, 7, 7, 2, 3, 5, 5, 5,
0, 3, 2, 1, 7, 5, 0, 4, 0, 2, 7, 3, 0, 8, 3, 2, 4, 5, 5, 3, 2,
3, 8, 6, 5, 6, 7, 0, NA, 7, 7, 2, 0, 3, 4, 7, 2, 1, 2, 0, 0,
4, 3, 3, 6, 6, 1, 4, 0, 4, 0, 0, 7, 6, 4, 4, 6, 5, 4, 3, 3, 0,
NA, 2, 5), political_interest = c(2, 0, 3, 3, 2, 1, 2, 2, 2,
2, 2, 2, 3, 3, 3, 3, 2, 2, 3, 2, 1, 2, 2, 2, 2, 0, 2, 1, 3, 1,
1, 1, 1, 1, 2, 3, 2, 2, 2, 1, 3, 3, 2, 3, 2, 1, 3, 2, 0, 3, 1,
1, 2, 1, 2, 2, 1, 3, 3, 2, 3, 2, 3, 2, 2, 1, 2, 0, 3, 1, 2, 2,
1, 3, 2, 2, 1, 2, 2, 0, 3, 2, 2, 1, 2, 1, 1, 3, 1, 1, 3, 2, 0,
2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 0, 1, 1, 2, 2, 2, 2,
2, 0, 0, 2, 3, 2, 2, 2, 3, 3, 0, 3, 3, 1, 2, 1, 1, 1, 2, 3, 2,
2, 2, 0, 2, 2, 2, 1, 2, 3, 3, 1, 2, 0, 1, 1, 0, 2, 2, 1, 2, 2,
2, 2, 3, 2, 1, 2, 2, 0, 0, 3, 2, 2, 2, 1, 2, 3, 0, 1, 2, 3, 2,
2, 2, 1, 3, 1, 1, 2, 2, 3, 3, 1, 2, 2, 2, 2, 2, 1, 0, 1, 1, 0,
3, 3), education_level = c(0, 2, 1, 5, 5, 0, 4, 4, 0, 0, 3, 2,
3, 4, 0, 4, 4, 4, 4, 3, 0, NA, 4, 0, 4, 3, 4, 1, 5, 2, NA, 0,
0, 4, 3, 3, 5, 3, 4, 0, 4, 4, 0, 4, 5, 4, 2, 2, 0, 5, 3, 0, 4,
1, 5, 4, 0, 4, 4, 5, 5, 4, 4, 4, 5, 2, 3, 2, 4, 0, 4, 0, 5, 4,
4, 4, 4, 4, 4, 2, 4, 5, 3, 4, 3, 0, 4, 4, 4, 3, 4, 4, 0, 3, 4,
2, 3, 3, 0, 4, 4, 4, 5, 4, 0, 4, 4, 4, 0, 3, 1, 4, NA, 4, 0,
1, 2, 4, 0, 2, 1, 4, 4, 4, 3, NA, 5, 2, 1, 0, 0, 4, 3, 3, 4,
3, 0, 3, NA, 4, 0, 0, 4, 5, 4, 5, 2, 2, 0, 3, 4, 3, 1, 3, 2,
3, 5, 0, 4, 5, 0, 5, 2, 0, 3, NA, NA, 2, 4, 3, 4, 3, 2, 2, 4,
4, 3, 0, 4, 0, 4, 4, 3, 0, 4, 4, 3, 5, 0, 3, 0, 4, 3, 0, 3, 3,
3, 4, 5, 1)), row.names = c(NA, -200L), class = "data.frame")
I start by defining a list of relevant variables - this is not necessary but basically a consequence of using the code in a Shiny up:
input <- list()
input$x <- "education_level"
input$y <- "political_trust"
input$z <- "political_interest"
Next, creating the surface data:
# Regressing "political_interest" on "education_level" and "political_trust":
lm <- lm(as.formula(paste0(input$z, " ~ ", input$x, " + ", input$y)), data)
# Defining range of values that outcome will be predicted for
axis_x <- seq(min(data[, input$x], na.rm = T),
max(data[, input$x], na.rm = T), by = 0.2)
axis_y <- seq(min(data[, input$y], na.rm = T),
max(data[, input$y], na.rm = T), by = 0.2)
# Predicting outcome, and getting data into surface format
lm_surface <- expand.grid(x = axis_x, y = axis_y, KEEP.OUT.ATTRS = F)
colnames(lm_surface) <- c(input$x, input$y)
lm_surface <- acast(lm_surface, as.formula(paste0(input$x, " ~ ", input$y)),
value.var = input$z)
Last, plotting this with plotly:
data %>%
filter(!is.na(get(input$z))) %>%
filter(!is.na(get(input$x))) %>%
filter(!is.na(get(input$y))) %>%
plot_ly(., x = ~jitter(get(input$x), factor = 2.5),
y = ~jitter(get(input$y), factor = 2.5),
z = ~jitter(get(input$z), factor = 2.5),
type = "scatter3d", mode = "markers",
marker = list(size = 2, color = "#cccccc")) %>%
add_surface(., z = lm_surface,
x = axis_x,
y = axis_y,
type = "surface")
This gives me the following. As you can see, the surface does not cover the full range of the y-dimension. Note also that the surface plotted is "quadratic" - i.e. same length in x and y - although it should have non-quadratic dimensions.
I can bring plotly to draw larger surface area, e.g. by changing the range of values like below, but it always stays quadratic.
axis_x <- seq(0, 10, by = 0.2)
axis_y <- seq(0, 10, by = 0.2)
Ok, question solved.
It's important which dimension of the surface matrix (lm_surface) is which. Swapping x and y when applying acast fixes the issue:
lm_surface <- acast(lm_surface, as.formula(paste0(input$y, " ~ ", input$x)),
value.var = input$z)

R: how to count the number of times two elements have the same ID (perhaps using the outer function)

I have the following three dimensional array:
dput(a)
structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 2, 1, 1, 1, 2, 2,
2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 6, 2, 7, 6, 2, 7, 6, 2, 7, 4, 2, 4, 4, 2, 6, 4, 2, 4, 6, 2,
7, 4, 2, 6, 4, 2, 6, 4, 2, 6, 4, 2, 4, 4, 2, 6, 4, 2, 4, 4, 2,
6, 4, 2, 6, 4, 2, 6, 6, 2, 7, 4, 2, 6, 4, 2, 6, 4, 2, 4, 2, 3,
1, 2, 3, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 3, 7, 2, 3,
7, 2, 3, 7, 2, 3, 7, 2, 3, 7, 2, 3, 7, 2, 3, 7, 2, 3, 7, 2, 3,
7, 2, 3, 7, 1, 2, 5, 2, 3, 7, 1, 2, 4, 2, 3, 7, 2, 3, 7, 2, 3,
7, 2, 3, 7, 2, 3, 7, 2, 3, 7, 2, 3, 7, 2, 6, 3, 2, 6, 3, 2, 6,
3, 2, 6, 3, 2, 6, 3, 2, 6, 3, 2, 6, 3, 2, 6, 3, 2, 6, 3, 2, 6,
3, 1, 1, 1, 2, 6, 3, 1, 5, 5, 2, 6, 3, 2, 6, 3, 2, 6, 3, 2, 6,
3, 2, 6, 3, 2, 6, 3, 2, 6, 3, 3, 3, 2, 3, 3, 2, 3, 3, 2, 3, 13,
2, 3, 13, 2, 3, 5, 2, 3, 5, 2, 15, 17, 2, 15, 17, 2, 15, 17,
2, 3, 5, 2, 15, 17, 2, 3, 13, 2, 15, 17, 2, 15, 17, 2, 3, 13,
2, 3, 5, 2, 15, 17, 2, 15, 17, 2, 3, 5, 2), .Dim = c(3L, 20L,
6L), .Dimnames = list(c("cl.tmp", "cl.tmp", "cl.tmp"), NULL,
NULL))
The dimension of this array (a) is 3x20x6 (after edits).
I wanted to count the proportion of times that a[,i,] matches a[,j,] element-by-element in the matrix. Basically, I wanted to get mean(a[,i,] == a[,j,]) for all i, j, and I would like to do this fast but in R.
It occurred to me that the outer function might be a possibility but I am not sure how to specify the function. Any suggestions, or any other alternative ways?
The output would be a 20x20 symmetric matrix of nonnegative elements with 1 on the diagonals.
The solution given below works (thanks!) but I have one further question (sorry).
I would like to display the coordinates above in a heatmap. I try the following:
n<-dim(a)[2]
xx <- matrix(apply(a[,rep(1:n,n),]==a[,rep(1:n,each=n),],2,sum),nrow=n)/prod(dim(a)[-2])
image(1:20, 1:20, xx, xlab = "", ylab = "")
This gives me the following heatmap.
However, I would like to display (reorder the coordinate) such that I get all the coordinates that have high-values amongst each other together. However, I would not like to bias the results by deciding on the number of groups myself. I tried
hc <- hclust(as.dist(1-xx), method = "single")
but I can not decide how to cut the resulting tree to decide on bunching the coordinates together. Any suggestions? Bascically, in the figure, I would like the coordinate pairs in the top left (and bottom right off-diagonal blocks) to be as low-valued (in this case as red) as possible.
Looking around on SO, I found that there exists a function heatmap which might do this,
heatmap(xx,Colv=T,Rowv=T, scale='none',symm = T)
and I get the following:
which is all right, but I can not figure out how to get rid of the dendrograms on the sides or the axes labels. It does work if I extract out and do the following:
yy <- heatmap(xx,Colv=T,Rowv=T, scale='none',symm = T,keep.dendro=F)
image(1:20, 1:20, xx[yy$rowInd,yy$colInd], xlab = "", ylab = "")
so I guess that is what I will stick with. Here is the result:
Try this:
n<-dim(a)[2]
matrix(apply(a[,rep(1:n,n),]==a[,rep(1:n,each=n),],2,sum),nrow=n)/prod(dim(a)[-2])
It has to be stressed that the memory usage of this method goes with n^2 so you might have trouble to use it with larger arrays.

Generate a sequence number (1,1,1,2,2,2,3,3,3) within groups of different length

I have a data frame with a column "Tag", here with four different levels. I need help to create the "Seq" column, a sequence generated from the "Tag" Column:
df <- data.frame(Tag = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4),
Seq = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3 )
Each "Tag" should be divided into 3 sub-groups defined by "Seq". We need to generate runs of 1, 2, and 3, with a total length of that of each "Tag". Thus, the length of each run of 1, 2, and 3 respectively depends on length of each "Tag".
Note that the length each "Tag" differs. For example, Tag 1 is of length 31, and has a "Seq" 10 times 1, 10 times 2, and 11 times 3.
To begin with, Tag 1 is 31 while tag 2 is 32. Looking at the code below, the first number (1) will always be of lesser length than the next two (2,3). I used a ceiling process to come up with this. There is no clear criteria on what the code should do if the number is eg 31/3.. should it give a length of 10, 10, 11? or even 9, 11,11 will be fine? The code gives a 9, 11, 11 length:
ec=table(Tag)
unlist(mapply(function(x,y)rep(c(1,2,3),c(x,y,y)),ec-2*ceiling(ec/3),ceiling(ec/3)))
To check the outputted results, save the results in a variable.. d=mapply(...
then do sapply(d,table).
Hope this will be of help.
ave(Tag, Tag, FUN = function(x){sort(rep(x = 1:3, length.out = length(x)))})
Explanation: For each level of "Tag" (ave(Tag, Tag, ...): repeat each level of "Seq" (x = 1:3) to the length of the subset of "Tag" (length.out = length(x)). sort the numbers.

Resources