Writing a function for summary statistics in R - r

I'm having a problem I can't figure out... Basically I want to generate mean, SD, and N per group for a number of variables. My data looks like this:
dataSet <- data.frame(study_id=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),
Timepoint=c(1,6,12,18,1,6,12,18,1,6,12,18,1,6,12,18),
Secretor=c(0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1),
Gene1=runif(16, min=0, max=100),
Gene2=runif(16, min=0, max=100),
Gene3=runif(16, min=0, max=100),
Gene4=runif(16, min=0, max=100))
Then I group it...
library(tidyverse)
grouped_dataSet <- dataSet %>%
group_by(Secretor, Timepoint)
When I run the following line of code, I get what I want:
summarise(grouped_dataSet, mean = mean(Gene1, na.rm=T), sd = sd(Gene1, na.rm=T), n = n())
Output:
# A tibble: 8 x 5
# Groups: Secretor [2]
Secretor Timepoint mean sd n
<dbl> <dbl> <dbl> <dbl> <int>
1 0 1 21.8 18.6 2
2 0 6 34.8 33.2 2
3 0 12 43.1 4.34 2
4 0 18 72.6 38.0 2
5 1 1 13.3 15.3 2
6 1 6 41.2 22.8 2
7 1 12 44.9 25.7 2
8 1 18 37.0 8.49 2
However, when I write this same line of code as a function (which I'm intending to then map onto many columns using tidyverse's purrr package), it doesn't work, instead returning "NA" for everything except the n column:
summary_function <- function(x) {
summary <- summarise(grouped_dataSet, mean = mean(x, na.rm=T), sd = sd(x, na.rm=T), n = n())
return(summary)
}
summary_function("Gene1")
Output:
# A tibble: 8 x 5
# Groups: Secretor [2]
Secretor Timepoint mean sd n
<dbl> <dbl> <dbl> <dbl> <int>
1 0 1 NA NA 2
2 0 6 NA NA 2
3 0 12 NA NA 2
4 0 18 NA NA 2
5 1 1 NA NA 2
6 1 6 NA NA 2
7 1 12 NA NA 2
8 1 18 NA NA 2
This is the warning I get:
In var(if (is.vector(x) || is.factor(x)) x else as.double(x), ... :
NAs introduced by coercion
Could anyone please provide advice as to why it works as a line of code, but not as a function?
Many thanks in advance.

#akrun's suggestion for how to immediately solve your question is right on.
An alternative is to use the nesting functionality of tidyr by returning a single element list which contains a data.frame of your results.
summary_function <- function(x) {
summary <- list(tibble(mean = mean(x, na.rm=T), sd = sd(x, na.rm=T), n = length(x[!is.na(x)])))
return(summary)
}
Then you can use across to do the same function to multiple columns:
dataSet %>%
group_by(Secretor, Timepoint) %>%
summarize(across(Gene1:Gene4, summary_function))
# A tibble: 8 x 6
# Groups: Secretor [2]
# Secretor Timepoint Gene1 Gene2 Gene3 Gene4
# <dbl> <dbl> <list> <list> <list> <list>
#1 0 1 <tibble [1 × 3]> <tibble [1 × 3]> <tibble [1 × 3]> <tibble [1 × 3]>
#2 0 6 <tibble [1 × 3]> <tibble [1 × 3]> <tibble [1 × 3]> <tibble [1 × 3]>
#3 0 12 <tibble [1 × 3]> <tibble [1 × 3]> <tibble [1 × 3]> <tibble [1 × 3]>
#4 0 18 <tibble [1 × 3]> <tibble [1 × 3]> <tibble [1 × 3]> <tibble [1 × 3]>
#5 1 1 <tibble [1 × 3]> <tibble [1 × 3]> <tibble [1 × 3]> <tibble [1 × 3]>
#6 1 6 <tibble [1 × 3]> <tibble [1 × 3]> <tibble [1 × 3]> <tibble [1 × 3]>
#7 1 12 <tibble [1 × 3]> <tibble [1 × 3]> <tibble [1 × 3]> <tibble [1 × 3]>
#8 1 18 <tibble [1 × 3]> <tibble [1 × 3]> <tibble [1 × 3]> <tibble [1 × 3]>
Now we can unnest those same columns using unnest with names_sep =:
dataSet %>%
group_by(Secretor, Timepoint) %>%
summarize(across(Gene1:Gene4, summary_function)) %>%
unnest(Gene1:Gene4, names_sep = "_")
# A tibble: 8 x 14
# Groups: Secretor [2]
# Secretor Timepoint Gene1_mean Gene1_sd Gene1_n Gene2_mean Gene2_sd Gene2_n Gene3_mean Gene3_sd Gene3_n
# <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <int> <dbl> <dbl> <int>
#1 0 1 71.2 28.6 2 62.3 27.0 2 28.4 33.3 2
#2 0 6 5.40 7.43 2 58.6 29.1 2 37.0 33.9 2
#3 0 12 91.8 11.4 2 53.9 31.0 2 33.2 46.0 2
#4 0 18 51.5 65.0 2 65.3 40.2 2 63.8 32.7 2
#5 1 1 30.8 18.0 2 50.0 19.9 2 22.8 6.71 2
#6 1 6 63.9 49.2 2 59.9 41.8 2 30.9 39.5 2
#7 1 12 85.3 6.74 2 51.0 41.1 2 28.5 22.9 2
#8 1 18 41.7 44.8 2 80.2 24.0 2 64.7 17.4 2
## … with 3 more variables: Gene4_mean <dbl>, Gene4_sd <dbl>, Gene4_n <int>
This is a recent addition to tidyr and dplyr (version >=1.0.0), but can come handy.

We can use ensym so that we can pass either quoted or unquoted and it can be evaluated (!!)
summary_function <- function(x) {
x <- ensym(x)
summarise(grouped_dataSet,
mean = mean(!! x, na.rm=T), sd = sd(!!x, na.rm=T), n = n())
}
summary_function("Gene1")
# A tibble: 8 x 5
# Groups: Secretor [2]
# Secretor Timepoint mean sd n
# <dbl> <dbl> <dbl> <dbl> <int>
#1 0 1 69.4 2.25 2
#2 0 6 9.67 13.6 2
#3 0 12 39.5 10.6 2
#4 0 18 17.4 19.2 2
#5 1 1 41.0 54.0 2
#6 1 6 58.5 7.57 2
#7 1 12 75.5 1.42 2
#8 1 18 80.5 24.7 2
summary_function(Gene1)
# A tibble: 8 x 5
# Groups: Secretor [2]
# Secretor Timepoint mean sd n
# <dbl> <dbl> <dbl> <dbl> <int>
#1 0 1 69.4 2.25 2
#2 0 6 9.67 13.6 2
#3 0 12 39.5 10.6 2
#4 0 18 17.4 19.2 2
#5 1 1 41.0 54.0 2
#6 1 6 58.5 7.57 2
#7 1 12 75.5 1.42 2
#8 1 18 80.5 24.7 2
Also, for reusability in different datasets, it may be better to have additional argument that takes the dataset object

Related

Rolling Window Regression by group in R (with dates)

THIS IS MY DATA
I have a panel data in R, so I want to create a rolling window linear regression by group. For instance, I have a lot of dates from 1 to 618. Each number represents one date, but I have more than one observation for each date.
I want to create a rolling window for 20 dates. Finally, i want to output all coefficients for lm(y~x1+x2+x3+x4+x5+x6) in the period 1:20, and make a rolling window for doing another regression for 2:21, 3:22.. and so on for all my observations, so the last coefficients are for 598:618 period (I have 618 so i can´t do it manually).
My problem is that i select a window for 20 observations but i only get to select this 20 first observations, for example:
1
1
1
1
1
1
1 .... 1
and maybe the first 20 observations are only observations for the first date (1), because there are more than one observations by date. So I want to catch 20 observationes filtering by group, actually this will be more than 20 observations, but i want to rolling by date (date 1 to date 20, regardless of the observations.
After that, i need to estimate by Newey West method, so i need include in the final code something like that and output all coefficients and t-statistics.
neweywest <- coeftest(LMOBJECT, vcov. = NeweyWest, lag=12)
I hope it has been understood well.
You can create multiple linear models for a given interval of dates like this:
library(tidyverse)
# example data
set.seed(1337)
n_dates <- 10
data <- tibble(
date = runif(100, min = 1, max = n_dates) %>% floor(),
x1 = runif(100)**2,
x2 = runif(100) * 2,
x3 = runif(100) + 2,
y = x1 + 2 * x2 + runif(100)
) %>%
arrange(date)
data
#> # A tibble: 100 × 5
#> date x1 x2 x3 y
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.754 0.700 2.21 2.79
#> 2 1 0.0230 1.97 2.70 4.89
#> 3 1 0.388 0.500 2.21 1.54
#> 4 1 0.225 0.135 2.87 0.849
#> 5 1 0.00000810 0.139 2.22 1.12
#> 6 1 0.255 0.893 2.21 2.25
#> 7 1 0.402 1.37 2.06 3.51
#> 8 1 0.00275 0.363 2.68 0.984
#> 9 2 0.238 1.68 2.53 3.98
#> 10 2 0.0309 1.47 2.05 3.69
#> # … with 90 more rows
# number of rows per day
data %>% count(date)
#> # A tibble: 9 × 2
#> date n
#> <dbl> <int>
#> 1 1 8
#> 2 2 10
#> 3 3 15
#> 4 4 8
#> 5 5 10
#> 6 6 10
#> 7 7 12
#> 8 8 7
#> 9 9 20
# size of rolling window in days
window_size <- 3
models <- tibble(
from = seq(n_dates),
to = from + window_size - 1
) %>%
mutate(
data = from %>% map2(to, ~ data %>% filter(date >= .x & date <= .y)),
model = data %>% map(possibly(~ lm(y ~ x1 + x2 + x3, data = .x), NA))
)
models
#> # A tibble: 10 × 4
#> from to data model
#> <int> <dbl> <list> <list>
#> 1 1 3 <tibble [33 × 5]> <lm>
#> 2 2 4 <tibble [33 × 5]> <lm>
#> 3 3 5 <tibble [33 × 5]> <lm>
#> 4 4 6 <tibble [28 × 5]> <lm>
#> 5 5 7 <tibble [32 × 5]> <lm>
#> 6 6 8 <tibble [29 × 5]> <lm>
#> 7 7 9 <tibble [39 × 5]> <lm>
#> 8 8 10 <tibble [27 × 5]> <lm>
#> 9 9 11 <tibble [20 × 5]> <lm>
#> 10 10 12 <tibble [0 × 5]> <lgl [1]>
models %>%
filter(!is.na(model)) %>%
transmute(
from, to,
coeff = model %>% map(coefficients),
r2 = model %>% map_dbl(~ .x %>% summary() %>% pluck("r.squared"))
) %>%
unnest_wider(coeff)
# A tibble: 9 x 7
# from to `(Intercept)` x1 x2 x3 r2
# <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 3 0.601 0.883 2.07 -0.0788 0.970
#2 2 4 0.766 0.965 2.01 -0.141 0.965
#3 3 5 0.879 0.954 1.94 -0.165 0.953
Another way of subseting groups is to use nest:
# get all observations from day 3 to 5
data %>% arrange(date) %>% nest(-date) %>% slice(3:5) %>% unnest()

Extract data from a nested dataframe into the same record

I have this main dataframe:
testdataframe
id sensors_data
<chr> <list>
1 AA <data.frame [6 × 4]>
2 BB <data.frame [6 × 4]>
and every dataframe of sensors_data looks like this:
id type value status
<chr><chr> <dbl> <int>
1 SN01TP a 25.800 1
2 SN01HU b 40.000 1
3 SN02VD c 1.146 1
4 SN02C2 d 1270.000 1
5 SY01DS e 31.000 1
6 TD01TP f 22.500 1
I would want my main data frame to be, instead of only sensors_data, something like this:
a b c d e f
1 25.800 40.000 1.146 1270.000 31.000 22.500
I've tried unnesting the main data frame but that would create a record for each field. What I'm trying is to mutate the main data frame accessing the data inside sensors_data, but I can't figure out how.
Using purrr:map and tidyr::pivot_wider, you can do this. Use bind_rows if you want one dataframe.
df$pivoted <- df$sensors_data %>%
map(~ tidyr::pivot_wider(.[,c("type","value")], names_from = type))
df$pivoted
[[1]]
a b c d e f
1 25.8 40 1.15 1270 31 22.5
[[2]]
a b c d e f
1 25.8 40 1.15 1270 31 22.5
df
# A tibble: 2 x 3
id sensors_data pivoted
<chr> <list> <list>
1 AA <df [6 x 4]> <tibble [1 x 6]>
2 BB <df [6 x 4]> <tibble [1 x 6]>
Or, with bind_rows and bind_cols:
df$sensors_data %>%
map(~ tidyr::pivot_wider(.[,c("type","value")], names_from = type)) %>%
bind_rows() %>%
bind_cols(df,.)
# A tibble: 2 x 8
id sensors_data a b c d e f
<chr> <list> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AA <df [6 x 4]> 25.8 40 1.15 1270 31 22.5
2 BB <df [6 x 4]> 25.8 40 1.15 1270 31 22.5
Data:
df1 <- read.table(header = T, text=" id type value status
1 SN01TP a 25.800 1
2 SN01HU b 40.000 1
3 SN02VD c 1.146 1
4 SN02C2 d 1270.000 1
5 SY01DS e 31.000 1
6 TD01TP f 22.500 1")
df <- tibble(id = c("AA", "BB"), sensors_data = list(df1,df1))
> df
id sensors_data
1 AA <df [6 x 4]>
2 BB <df [6 x 4]>
My favorite answer is already provided by Maël!
Here is an alternative using lapply
library(dplyr)
library(tidyr)
sensors_data_sub <- lapply(sensors_data, function(x)x[,2:3])
sensors_data_sub_wide <- lapply(1:length(sensors_data_sub),
function(x) (pivot_wider(sensors_data_sub[[x]], names_from = type, values_from = value)))
bind_rows(sensors_data_sub_wide)
a b c d e f
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 25.8 40 1.15 1270 31 22.5
2 25.8 40 1.15 1270 31 22.5

Arrange values within a specific group

I'm trying to arrange values in decreasing order within a exact group in a nested dataframe. My input data looks like this. I've got two grouping variables (group1 and group2) and three values (i.e. id, value2, value3).
library(tidyverse)
set.seed(1234)
df <- tibble(group1 = c(rep(LETTERS[1:3], 4)),
group2 = c(rep(0, 6), rep(2, 6)),
value2 = rnorm(12, 20, sd = 10),
value3 = rnorm(12, 20, sd = 50)) %>%
group_by(group1) %>%
mutate(id = c(1:4)) %>%
ungroup()
I decided to group them by group1 and group2 and then nest():
df_nested <- df %>%
group_by(group1, group2) %>%
nest()
# A tibble: 6 x 3
# Groups: group1, group2 [6]
group1 group2 data
<chr> <dbl> <list>
1 A 0 <tibble [2 x 3]>
2 B 0 <tibble [2 x 3]>
3 C 0 <tibble [2 x 3]>
4 A 2 <tibble [2 x 3]>
5 B 2 <tibble [2 x 3]>
6 C 2 <tibble [2 x 3]>
Perfect. Now I need to sort only those data which group2 is equal to 2 by id. However I'm receiving a following error:
df_nested %>%
mutate(data = map2_df(.x = data, .y = group2,
~ifelse(.y == 2, arrange(-.x$id),
.x)))
Error: Argument 1 must have names
You could do :
library(dplyr)
library(purrr)
df_nested$data <- map2(df_nested$data, df_nested$group2,~if(.y == 2)
arrange(.x, -.x$id) else .x)
So data where group2 is not equal to 2 is not sorted
df_nested$data[[1]]
# A tibble: 2 x 3
# value2 value3 id
# <dbl> <dbl> <int>
#1 13.1 -89.0 1
#2 9.76 -3.29 2
and where group2 is 2 is sorted.
df_nested$data[[4]]
# A tibble: 2 x 3
#value2 value3 id
# <dbl> <dbl> <int>
#1 15.0 -28.4 4
#2 31.0 -22.8 3
If you want to combine them do :
map2_df(df_nested$data, df_nested$group2,~if(.y == 2) arrange(.x, -.x$id) else .x)
I would suggest creating an additional variable id_ which will be equal to the original id variable when group2 == 2 and NA otherwise. This way if we use it in sorting it'll make no effect when group2 != 2.
df %>%
mutate(id_ = if_else(group2 == 2, id, NA_integer_)) %>%
arrange(group1, group2, -id_)
#> # A tibble: 12 x 6
#> group1 group2 value2 value3 id id_
#> <chr> <dbl> <dbl> <dbl> <int> <int>
#> 1 A 0 17.6 50.2 1 NA
#> 2 A 0 33.8 -14.4 2 NA
#> 3 A 2 23.1 22.6 4 4
#> 4 A 2 13.7 50.2 3 3
#> 5 B 0 15.4 49.9 1 NA
#> 6 B 0 16.2 63.7 2 NA
#> 7 B 2 41.7 -2.90 4 4
#> 8 B 2 16.6 46.7 3 3
#> 9 C 0 19.9 -64.3 1 NA
#> 10 C 0 19.9 59.7 2 NA
#> 11 C 2 34.1 48.5 4 4
#> 12 C 2 32.3 23.1 3 3
Then if needed we can group and nest the result.

Divide list of tibbles by a list of tibbles in R using dplyr

Suppose I have a nested tibble in the following format:
# A tibble: 3 x 3
AccountNumber Tibble1 Tibble2
<int> <list> <list>
1 1 <tibble [1 x 3]> <tibble [1 x 3]>
2 2 <tibble [1 x 3]> <tibble [1 x 3]>
3 3 <tibble [1 x 3]> <tibble [1 x 3]>
This can be generated by the following code:
library(tidyverse)
tibble1 <- tibble(AccountNumber = 1:3, A_1 = 1, B_1 = 2, C_1 = 3) %>%
group_by(AccountNumber) %>%
nest(.key = "Tibble1")
tibble2 <- tibble(AccountNumber = 1:3, A_2 = 4, B_2 = 5, C_2 = 6) %>%
group_by(AccountNumber) %>%
nest(.key = "Tibble2")
tibble_joined <- left_join(tibble1, tibble2, by = "AccountNumber")
How would I create a third list of tibbles by dividing Tibble1 by Tibble 2?
Essentially I would like the following format:
# A tibble: 3 x 3
AccountNumber Tibble1 Tibble2 Tibble3(Tibble2 / Tibble1)
<int> <list> <list> <list>
1 1 <tibble [1 x 3]> <tibble [1 x 3]> <tibble [1 x 3]>
2 2 <tibble [1 x 3]> <tibble [1 x 3]> <tibble [1 x 3]>
3 3 <tibble [1 x 3]> <tibble [1 x 3]> <tibble [1 x 3]>
...where Tibble3 is simply the ratio of Tibble 2 over Tibble 1 of:
Every column for
Every Account Number
My attempts so far have been:
tibble_joined %>%
group_by(AccountNumber) %>%
mutate(Tibble3 = tibble(tibble2 / tibble1))
and
tibble_joined %>%
group_by(AccountNumber) %>%
summarise(Tibble3 = tibble2 / tibble1)
which both give this error:
Error in mutate_impl(.data, dots) :
Evaluation error: non-numeric argument to binary operator.
I've tried to find elegant solutions to this problem, but I can't find anything.
=========================================================================
I'm fully aware that my problem can be solved by the following:
tibble_Main %>%
group_by(AccountNumber) %>%
unnest() %>%
mutate(A_Ratio = A_2 / A_1,
B_Ratio = B_2 / B_1,
C_Ratio = C_2 / C_2)
...which generates the following:
# A tibble: 3 x 10
# Groups: AccountNumber [3]
AccountNumber A_1 B_1 C_1 A_2 B_2 C_2 A_Ratio B_Ratio C_Ratio
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1.00 2.00 3.00 4.00 5.00 6.00 4.00 2.50 1.00
2 2 1.00 2.00 3.00 4.00 5.00 6.00 4.00 2.50 1.00
3 3 1.00 2.00 3.00 4.00 5.00 6.00 4.00 2.50 1.00
...but this seems cumbersome, and will get annoying with many columns.
We can use map2 from purrr to divide one tibble over another
library(purrr)
res <- tibble_joined %>%
mutate(Tibble3 = map2(Tibble1, Tibble2, ~ as_tibble( .y/.x) %>%
rename_all(funs(sub('_.*', "_ratio", .)))))
res
# A tibble: 3 x 4
# AccountNumber Tibble1 Tibble2 Tibble3
# <int> <list> <list> <list>
#1 1 <tibble [1 x 3]> <tibble [1 x 3]> <tibble [1 x 3]>
#2 2 <tibble [1 x 3]> <tibble [1 x 3]> <tibble [1 x 3]>
#3 3 <tibble [1 x 3]> <tibble [1 x 3]> <tibble [1 x 3]>
res$Tibble3
#[[1]]
# A tibble: 1 x 3
# A_ratio B_ratio C_ratio
# <dbl> <dbl> <dbl>
#1 4.00 2.50 2.00
#[[2]]
# A tibble: 1 x 3
# A_ratio B_ratio C_ratio
# <dbl> <dbl> <dbl>
#1 4.00 2.50 2.00
#[[3]]
# A tibble: 1 x 3
# A_ratio B_ratio C_ratio
# <dbl> <dbl> <dbl>
#1 4.00 2.50 2.00
NOTE: purrr is part of the tidyverse packages

List-columns in tibbles: Can I link a list-column with another list-column?

This is my first post, so please excuse me if I sound silly or the answer I am looking for already exists.
My main problem is this: I have created a tibble containing 4 columns (a character column, two data columns and a column containing a distance matrix for each of the levels of the character column) and I am trying to create a function that uses the distance matrices from the 4th column as a dependent variable and some independent variables from the second column. The problem is that R keeps warning me that it cannot find the dependent variable.
The packages I've used are the following:
library(easypackages)
libraries('tidyverse', 'broom')
The tibble containing my IVs looks like this:
IVs_tibble
# A tibble: 175 × 8
Site Region IV.1 IV.2 IV.3 IV.4 IV.5 IV.6
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Site.1 A 387 169 460 234 137 445
2 Site.2 A 197 172 449 192 141 422
3 Site.3 A 86 179 432 78 147 398
4 Site.4 A 14 183 404 4 152 375
5 Site.5 B 86 179 407 80 148 382
6 Site.6 B 18 175 422 154 146 397
7 Site.7 C 132 172 429 211 142 413
8 Site.8 C 99 178 404 120 147 385
9 Site.9 D 73 177 409 150 146 382
10 Site.10 D 77 175 417 182 145 383
# ... with 165 more rows
I then nest it:
by_region <- IVs_tibble %>% group_by(Region) %>% nest()
And here's how it looks:
by_region
# A tibble: 6 × 2
Region data
<chr> <list>
1 A <tibble [60]>
2 B <tibble [84]>
3 C <tibble [10]>
4 D <tibble [6]>
5 E <tibble [13]>
6 F <tibble [2]>
Subsequently, I create another tibble containing raw presence/absence data:
regions
# A tibble: 175 × 984
Region Site Taxon.1 Taxon.2 Taxon.3
<chr> <chr> <dbl> <dbl> <dbl>
1 A Site.1 1 1 0
2 A Site.1 0 1 0
3 B Site.1 1 1 1
4 B Site.1 0 0 0
5 C Site.1 1 0 1
6 C Site.1 0 0 1
7 D Site.1 1 0 0
8 D Site.1 1 1 0
9 D Site.1 0 0 0
10 F Site.10 0 1 0
# ... with 165 more rows, and 982 more variables: (these contain taxa names)
Then I nest that tibble too:
rg <- regions %>% group_by(Region) %>% nest()
And it looks like:
rg
# A tibble: 6 × 2
Region IVs
<chr> <list>
1 A <tibble [60]>
2 B <tibble [84]>
3 C <tibble [10]>
4 D <tibble [6]>
5 E <tibble [13]>
6 F <tibble [2]>
And I rename the data column in order to join it with the tibble containing the IVs:
rr <- rg %>% rename(Communities = data)
rr
# A tibble: 6 × 2
Region Communities
<chr> <list>
1 A <tibble [60]>
2 B <tibble [84]>
3 C <tibble [10]>
4 D <tibble [6]>
5 E <tibble [13]>
6 F <tibble [2]>
As a following step, I construct a function to compute the matrices:
betamatrices <-function(df){vegan::betadiver(df, method='sim')}
rr <- rr %>% mutate(model = map(data,betamatrices))
The rr tibble now looks like this:
rr
# A tibble: 6 × 3
Region Communities Dist.matrix
<chr> <list> <list>
1 A <tibble [60]> <S3: dist>
2 B <tibble [84]> <S3: dist>
3 C <tibble [10]> <S3: dist>
4 D <tibble [6]> <S3: dist>
5 E <tibble [13]> <S3: dist>
6 F <tibble [2]> <S3: dist>
And then, I join the two tibbles:
my_tibble <- by_region %>% left_join(rr)
The tibble looks like this:
my_tibble
# A tibble: 6 × 4
Region IVs Communities Dist.matrix
<chr> <list> <list> <list>
1 A <tibble [60]> <tibble [60]> <S3: dist>
2 B <tibble [84]> <tibble [84]> <S3: dist>
3 C <tibble [10]> <tibble [10]> <S3: dist>
4 D <tibble [6]> <tibble [6]> <S3: dist>
5 E <tibble [13]> <tibble [13]> <S3: dist>
6 F <tibble [2]> <tibble [2]> <S3: dist>
And the function I want to apply looks like this:
mrm_model <- function(df){ecodist::MRM(Dist.matrix~dist(IV.1) + dist(IV.2),data = (df))}
When I try to compute it with the following code:
my_tibble <- my_tibble %>% mutate(mrm = map(IVs,mrm_model)),
I get this error message:
Error in mutate_impl(.data, dots) : object 'Dist.matrix' not found.
Do you have any idea why this keeps popping up?
When I try to "correct" the function with the $ sign:
mrm_model <- function(df){ecodist::MRM(my_tibble$Dist.matrix~dist(Area),data = (df))},
I get the following warning:
Error in mutate_impl(.data, dots) :
invalid type (list) for variable 'my_tibble$Dist.matrix'.
I am an absolute newbie in this type of data-manipulation, so obviously I am over my head and I would greatly appreciate all the help I can get.
I figured out that the problem can be solved if the tibble contains BOTH the presence/absence data and the IVs. Anyway, thanks for the interest lukeA

Resources