For loop with two variables in R - r

I'm constructing a panel dataset, which is going well. I can't get over this problem where I want to create some variables out of another dataframe.
I'm pretty sure I need the for-loop but can't find the solution for this specific situation.
I have these two dataframes:
name <- c("apple", "apple", "apple", "orange", "orange", "orange", "orange","orange")
day <- c(1,8,9,0,2,2,2,7)
score <- c(7,7,8,1,5,8,4,4)
df1 <- data.frame(name, day, score)
&
name1 <- c("apple", "apple", "apple", "apple", "apple", "apple", "apple", "apple", "apple", "apple", "apple", "orange", "orange", "orange", "orange","orange", "orange", "orange", "orange","orange", "orange","orange")
day1 <- c(0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10)
volume_day <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
volume_day_cum <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
avg_score_day <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
avg_score_cum <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
var_day <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
var_cum <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
df2 <- data.frame(name1, day1, volume_day, volume_day_cum, avg_score_day, avg_score_cum, var_day, var_cum)
I have a panel dataset on name-day level. Therefore, the instances of df1, which are given scores per row, needs to be coded in df2 for matching name and day. If there is no match, the 0 can stay. I'm looking for the instances itself (volume), the average scores and variation per day and cumulatively for all three variables. The resulting dataframe should look like this:
volume_day <- c(0,1,0,0,0,0,0,0,1,1,0,1,0,3,0,0,0,0,1,0,0,0)
volume_day_cum <- c(0,1,1,1,1,1,1,1,2,3,3,1,1,4,4,4,4,4,5,5,5,5)
avg_score_day <- c(0,7,0,0,0,0,0,0,7,8,0,1,0,5.66,0,0,0,0,4,0,0,0)
avg_score_cum <- c(0,7,7,7,7,7,7,7,7,7.33,7.33,1,1,4.5,4.5,4.5,4.5,4.5,4.4,4.4,4.4,4.4)
var_day <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,2.88,0,0,0,0,0,0,0,0)
var_cum <- c(0,0,0,0,0,0,0,0,0,0.22,0.22,0,0,6.25,6.25,6.25,6.25,6.25,5.04,5.04,5.04,5.04)
resultdata <- data.frame(name1, day1, volume_day, volume_day_cum, avg_score_day, avg_score_cum, var_day, var_cum)
I'm relatively new to R and coding in general. If I have insufficiently described my issue just let me know. Hopefully someone can help me out here.

There are some inconsistencies between your df1 and your resultdata, but here's a shot:
library(dplyr)
# library(zoo)
df1 %>%
group_by(name, day) %>%
summarize(
volume_day = as.numeric(n()),
var_day = var(score),
avg_score_day = mean(score),
score = sum(score)
) %>%
ungroup() %>%
full_join(select(df2, name=name1, day=day1), by = c("name", "day")) %>%
arrange(name, day) %>%
group_by(name) %>%
mutate_at(vars(volume_day, score, avg_score_day, var_day), ~ if_else(is.na(.), 0, .)) %>%
mutate(
volume_day_cum = cumsum(volume_day),
avg_score_cum = if_else(cumsum(score) == 0, 0, cumsum(score) / volume_day_cum),
var_cum = zoo::rollapply(score, n(), var, partial = TRUE)
) %>%
print(n=99)
# # A tibble: 22 x 9
# # Groups: name [2]
# name day volume_day var_day avg_score_day score volume_day_cum avg_score_cum var_cum
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 apple 0 0 0 0 0 0 0 8.17
# 2 apple 1 1 0 7 7 1 7 7
# 3 apple 2 0 0 0 0 1 7 6.12
# 4 apple 3 0 0 0 0 1 7 9.53
# 5 apple 4 0 0 0 0 1 7 12.6
# 6 apple 5 0 0 0 0 1 7 11.8
# 7 apple 6 0 0 0 0 1 7 12.6
# 8 apple 7 0 0 0 0 1 7 11
# 9 apple 8 1 0 7 7 2 7 12.1
# 10 apple 9 1 0 8 8 3 7.33 13.5
# 11 apple 10 0 0 0 0 3 7.33 15.1
# 12 orange 0 1 0 1 1 1 1 47.2
# 13 orange 1 0 0 0 0 1 1 40.6
# 14 orange 2 3 4.33 5.67 17 4 4.5 35.1
# 15 orange 3 0 0 0 0 4 4.5 31.5
# 16 orange 4 0 0 0 0 4 4.5 28.6
# 17 orange 5 0 0 0 0 4 4.5 26.2
# 18 orange 6 0 0 0 0 4 4.5 29.0
# 19 orange 7 1 0 4 4 5 4.4 32
# 20 orange 8 0 0 0 0 5 4.4 2
# 21 orange 9 0 0 0 0 5 4.4 2.29
# 22 orange 10 0 0 0 0 5 4.4 2.67

Related

Calculated Column Based on Rows with Date Range

I have a dataframe as follows:
ID
Col1
RespID
Col3
Col4
Year
Month
Day
1
blue
729Ad
3.2
A
2021
April
2
2
orange
295gS
6.5
A
2021
April
1
3
red
729Ad
8.4
B
2021
April
20
4
yellow
592Jd
2.9
A
2021
March
12
5
green
937sa
3.5
B
2021
May
13
I would like to calculate a new column, Col5, such that its value is 1 if the row has Col4 value of A and there exists another column somewhere in the dataset a row with the same RespId but a Col4 value of B. Otherwise it’s value is 0. Then I will drop all rows with Col4 value of B, to keep just those with A. I'd also like to account for the date fields (year, month, date) so that this is done in groups based on say a 30 day timeframe. So if 'B' appears within 30 days of when 'A' appears in the dataset, only then is there a 1 present (if 'B' appears within 60 days, then there is no 1. Additionally, I'd like to keep everything as data.frames.
Here is what the desired output table would look like prior to dropping rows with Col4 value of B:
ID
Col1
RespID
Col3
Col4
Col5
1
blue
729Ad
3.2
A
1
2
orange
295gS
6.5
A
0
3
red
729Ad
8.4
B
0
4
yellow
592Jd
2.9
A
0
5
green
937sa
3.5
B
0
I have found Ronak's solution in this thread (Calculated Column Based on Rows in Tidymodels Recipe) to be useful, however, would like to modify for the date range.
A lot of things to unpack here.
I think you're tripping up over your own feet by trying to do too many things at once. I've broken down the code into four distinct steps to make the thought process easy to follow. Obviously, for use in a production environment it should be rewritten more efficiently.
1. Generate some data
library(tidyverse)
set.seed(42)
df <- tibble(
id = c(1:10),
resp_id = c(1701, seq(2286, 2289), 1701, seq(2290, 2293)),
grouping = sample(c("A", "B"), size = 10, replace = TRUE),
date = seq.Date(as.Date("2363-10-04"), as.Date("2363-11-17"), length.out = 10)
)
Resulting data:
# A tibble: 10 × 4
id resp_id grouping date
<int> <dbl> <chr> <date>
1 1 1701 A 2363-10-04
2 2 2286 A 2363-10-08
3 3 2287 A 2363-10-13
4 4 2288 A 2363-10-18
5 5 2289 B 2363-10-23
6 6 1701 B 2363-10-28
7 7 2290 B 2363-11-02
8 8 2291 B 2363-11-07
9 9 2292 A 2363-11-12
10 10 2293 B 2363-11-17
2. Check grouping
df <- df %>%
mutate(
is_a = ifelse(grouping == "A", 1, 0),
is_b = ifelse(grouping == "B", 1, 0)
)
We have the grouping now as easy-to-use dummy variables:
> df
# A tibble: 10 × 6
id resp_id grouping date is_a is_b
<int> <dbl> <chr> <date> <dbl> <dbl>
1 1 1701 A 2363-10-04 1 0
2 2 2286 A 2363-10-08 1 0
3 3 2287 A 2363-10-13 1 0
4 4 2288 A 2363-10-18 1 0
5 5 2289 B 2363-10-23 0 1
6 6 1701 B 2363-10-28 0 1
7 7 2290 B 2363-11-02 0 1
8 8 2291 B 2363-11-07 0 1
9 9 2292 A 2363-11-12 1 0
10 10 2293 B 2363-11-17 0 1
3. Check completeness
df <- df %>%
group_by(
resp_id
) %>%
mutate(
# Check if the grouping has both "A" and "B" values
is_complete = ifelse(
sum(is_a) > 0 & sum(is_b) > 0,
1,
0
)
) %>%
ungroup()
We see that there is only one resp_id value that is complete — 1701:
> df
# A tibble: 10 × 7
id resp_id grouping date is_a is_b is_complete
<int> <dbl> <chr> <date> <dbl> <dbl> <dbl>
1 1 1701 A 2363-10-04 1 0 1
2 2 2286 A 2363-10-08 1 0 0
3 3 2287 A 2363-10-13 1 0 0
4 4 2288 A 2363-10-18 1 0 0
5 5 2289 B 2363-10-23 0 1 0
6 6 1701 B 2363-10-28 0 1 1
7 7 2290 B 2363-11-02 0 1 0
8 8 2291 B 2363-11-07 0 1 0
9 9 2292 A 2363-11-12 1 0 0
10 10 2293 B 2363-11-17 0 1 0
4. Assign target value
df <- df %>%
group_by(
resp_id
) %>%
mutate(
# Check if the "A" part of a complete grouping has a another value within 30 days
is_within_timeframe = ifelse(
is_complete == 1 & is_a == 1 & max(date) - min(date) <= 30,
1,
0
)
) %>%
ungroup()
We see that our one complete set has in fact a B value that falls within 30 days of the A observation (Caveat: This only works if there are always exactly one or two observations per grouping!). Column is_within_timeframe corresponds to your Col4:
> df
# A tibble: 10 × 8
id resp_id grouping date is_a is_b is_complete is_within_timeframe
<int> <dbl> <chr> <date> <dbl> <dbl> <dbl> <dbl>
1 1 1701 A 2363-10-04 1 0 1 1
2 2 2286 A 2363-10-08 1 0 0 0
3 3 2287 A 2363-10-13 1 0 0 0
4 4 2288 A 2363-10-18 1 0 0 0
5 5 2289 B 2363-10-23 0 1 0 0
6 6 1701 B 2363-10-28 0 1 1 0
7 7 2290 B 2363-11-02 0 1 0 0
8 8 2291 B 2363-11-07 0 1 0 0
9 9 2292 A 2363-11-12 1 0 0 0
10 10 2293 B 2363-11-17 0 1 0 0

R - Grouping rows by matching value then adding rows to matching columns in another data frame

I am trying to add values from one data frame (ex2) to an existing data frame (ex1) based on two different columns. As you can see, there is an ID column in both data frames. But in ex2, each column of ex1 is represented by a different row instead of a column. For each matching ID, I want to add the result from ex2$result to the matching row in ex1 under the appropriate column heading (if ex2$alpha[i] = a then ex2$result[i] gets added to ex1$a[z] where ex2$id[i]=ex1$id[z]). Another complication is that not all of the columns from ex1 will have alpha value in ex2, so those should be set as 'NA'.
ex1 <- data.frame(
id = c(1:20),
a = c(rep(1,5),rep(0,5),rep(NA,10)),
b = c(rep(c(1,0),5),rep(NA,10)),
c = c(rep(c(0,1),5),rep(NA,10)),
d = c(rep(0,5),rep(1,5),rep(NA,10))
)
ex2 <- data.frame(
id = c(rep(11,3),rep(12,3),rep(13,3),
rep(14,2),rep(15,2),
rep(16,4),rep(17,4),rep(18,4),rep(19,4),rep(20,4)),
alpha = c(rep(c('a','b','d'),3),rep(c('a','b'),2),
rep(c('a','b','c','d'),5)),
result = c(rep(c(0,1,1),11))
)
Thanks for your help!
I believe the attached snippet does what you want it to do. But it is hard to know from your toy data if it is feasible to write out the columns a to d in the mutate statement. There surely is a more clever programmatic way to approach this problem.
ex1 <- data.frame(
id = c(1:20),
a = c(rep(1,5),rep(0,5),rep(NA,10)),
b = c(rep(c(1,0),5),rep(NA,10)),
c = c(rep(c(0,1),5),rep(NA,10)),
d = c(rep(0,5),rep(1,5),rep(NA,10))
)
ex2 <- data.frame(
id = c(rep(11,3),rep(12,3),rep(13,3),
rep(14,2),rep(15,2),
rep(16,4),rep(17,4),rep(18,4),rep(19,4),rep(20,4)),
alpha = c(rep(c('a','b','d'),3),rep(c('a','b'),2),
rep(c('a','b','c','d'),5)),
result = c(rep(c(0,1,1),11))
)
library(tidyverse)
ex_2_wide <- pivot_wider(ex2, id_cols = id, names_from = alpha, values_from = result )
joined <- full_join(ex1, ex_2_wide, by = c("id" = "id")) %>%
mutate(a = coalesce(a.x, a.y)) %>%
mutate(b = coalesce(b.x, b.y)) %>%
mutate(c = coalesce(c.x, c.y)) %>%
mutate(d = coalesce(d.x, d.y)) %>%
select(-(a.x:c.y))
joined
#> id a b c d
#> 1 1 1 1 0 0
#> 2 2 1 0 1 0
#> 3 3 1 1 0 0
#> 4 4 1 0 1 0
#> 5 5 1 1 0 0
#> 6 6 0 0 1 1
#> 7 7 0 1 0 1
#> 8 8 0 0 1 1
#> 9 9 0 1 0 1
#> 10 10 0 0 1 1
#> 11 11 0 1 NA 1
#> 12 12 0 1 NA 1
#> 13 13 0 1 NA 1
#> 14 14 0 1 NA NA
#> 15 15 1 0 NA NA
#> 16 16 1 1 0 1
#> 17 17 1 0 1 1
#> 18 18 0 1 1 0
#> 19 19 1 1 0 1
#> 20 20 1 0 1 1
Created on 2021-01-07 by the reprex package (v0.3.0)
EDIT:
If we turn the problem around (we first make long tables, followed by join and merge, then pivot back wide), there is only a single step for merger, no matter how many columns you have.
library(tidyverse)
ex1_long <- pivot_longer(ex1, cols = a:d, names_to = "alpha")
joined <- full_join(ex1_long, ex2, by = c("id" = "id", "alpha" = "alpha")) %>%
mutate(value = coalesce(value, result)) %>% select(-result) %>%
pivot_wider(id_cols = id, names_from = alpha, values_from = value)
joined
#> # A tibble: 20 x 5
#> id a b c d
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 1 0 0
#> 2 2 1 0 1 0
#> 3 3 1 1 0 0
#> 4 4 1 0 1 0
#> 5 5 1 1 0 0
#> 6 6 0 0 1 1
#> 7 7 0 1 0 1
#> 8 8 0 0 1 1
#> 9 9 0 1 0 1
#> 10 10 0 0 1 1
#> 11 11 0 1 NA 1
#> 12 12 0 1 NA 1
#> 13 13 0 1 NA 1
#> 14 14 0 1 NA NA
#> 15 15 1 0 NA NA
#> 16 16 1 1 0 1
#> 17 17 1 0 1 1
#> 18 18 0 1 1 0
#> 19 19 1 1 0 1
#> 20 20 1 0 1 1
Created on 2021-01-07 by the reprex package (v0.3.0)

calculating percentages by category them in R

Here is a sample of my data:
df <- read.table(header = TRUE, text =
"book pen desk ipad
3 4 3 4
3 0 0 3
0 3 0 2
1 3 2 1
4 1 4 3
0 0 3 1
2 1 3 2
0 2 1 0
4 2 2 2
0 1 2 1
1 4 1 4
2 0 1 3
4 3 2 0
4 0 4 2"
)
The logic is that I want to have three categories: Low, Medium and high.
As an example, considering the column book, the values 0 and 1= Low, 2= Medium, 3 and 4=High. Next, I want to calculate the percentage for each category. As you can see below, for Low in the column book, the percentage is 42.85. I want to have an output like this for all columns. Please consider this is just a sample. Thanks for your help
Class Low Midium High
book 42.85 xx xx
pen xx xx xx
desk xx xx xx
ipad xx xx xx
ret <- t(sapply(df, function(a) {
lbls <- factor(c("Low", "Medium", "High"))
ct <- cut(a, c(0, 2, 4, Inf), right = FALSE, labels = lbls)
table(ct)
}))
t(apply(ret, 1, function(z) 100*z/sum(z)))
# Low Medium High
# book 42.85714 28.57143 28.57143
# pen 50.00000 35.71429 14.28571
# desk 35.71429 50.00000 14.28571
# ipad 35.71429 50.00000 14.28571
As a data.frame:
out <- as.data.frame(t(apply(ret, 1, function(z) 100*z/sum(z))))
out$Class <- rownames(out)
# rownames(out) <- NULL # optional, if you don't want them
out <- out[,c(4,1:3)]
out
# Class Low Medium High
# book book 42.85714 28.57143 28.57143
# pen pen 50.00000 35.71429 14.28571
# desk desk 35.71429 50.00000 14.28571
# ipad ipad 35.71429 50.00000 14.28571
Here's a tidyverse solution for you:
library(tidyverse)
df <- read.table(header = TRUE, text =
"book pen desk ipad
3 4 3 4
3 0 0 3
0 3 0 2
1 3 2 1
4 1 4 3
0 0 3 1
2 1 3 2
0 2 1 0
4 2 2 2
0 1 2 1
1 4 1 4
2 0 1 3
4 3 2 0
4 0 4 2"
)
df %>%
pivot_longer(1:4,
names_to = "Class",
values_to = "value") %>%
mutate(category = case_when(value %in% 0:1 ~ "l",
value == 2 ~ "m",
value %in% 3:4 ~ "h")) %>%
group_by(Class, category) %>%
count(category) %>%
pivot_wider(names_from = category, values_from = n) %>%
transmute(Class = Class,
High = h / sum(h, m, l)*100,
Medium = m / sum(h, m, l)*100,
Low = l / sum(h, m, l)*100)
And the resulting table:
# A tibble: 4 x 4
# Groups: Class [4]
Class High Medium Low
<chr> <dbl> <dbl> <dbl>
1 book 42.9 14.3 42.9
2 desk 35.7 28.6 35.7
3 ipad 35.7 28.6 35.7
4 pen 35.7 14.3 50
Enclosed a possible solution. There might be a better one, but I would have come up with it "spontaneously".
Greetings
df_test <- read.table(header = TRUE, text =
"book pen desk ipad
3 4 3 4
3 0 0 3
0 3 0 2
1 3 2 1
4 1 4 3
0 0 3 1
2 1 3 2
0 2 1 0
4 2 2 2
0 1 2 1
1 4 1 4
2 0 1 3
4 3 2 0
4 0 4 2"
)
low <- list()
medium <- list()
high <- list()
for(i in 1:ncol(df_test)) # i=1
{
low[[i]] <- ifelse((df_test[,i]==0 | df_test[,i]==1),df_test[,i],NA)
low[[i]] <- sum(colSums(!is.na(t(low[[i]])))) / length(low[[i]]) *100
medium[[i]] <- ifelse((df_test[,i]==2 | df_test[,i]==3),df_test[,i],NA)
medium[[i]] <- sum(colSums(!is.na(t(medium[[i]])))) / length(medium[[i]]) *100
high[[i]] <- ifelse((df_test[,i]==4 | df_test[,i]==5),df_test[,i],NA)
high[[i]] <- sum(colSums(!is.na(t(high[[i]])))) / length(high[[i]]) *100
}
names(low) <- colnames(df_test)
names(medium) <- colnames(df_test)
names(high) <- colnames(df_test)
df_test_final <- data.frame("Class"=colnames(df_test),"Low"=NA,"Medium"=NA,"High"=NA)
df_test_final[,2] <- do.call(rbind,low)
df_test_final[,3] <- do.call(rbind,medium)
df_test_final[,4] <- do.call(rbind,high)

r convert summary data to presence/absence data

I conducted 5 presence/absence measures at multiple sites and summed them together and ended up with a dataframe that looked something like this:
df <- data.frame("site" = c("a", "b", "c"),
"species1" = c(0, 2, 1),
"species2" = c(5, 2, 4))
ie. at site "a" species1 was recorded 0/5 times and species2 was recorded 5/5 times.
What I would like to do is convert this back into presence/absence data. Something like this:
data.frame("site" = ("a", "b", "c"),
"species1" = c(0,0,0,0,0, 1,1,0,0,0, 1,0,0,0,0),
"species2" = c(1,1,1,1,1, 1,1,0,0,0, 1,1,1,1,0))
I can duplicate each row 5 times with:
df %>% slice(rep(1:n(), each = 5))
but I can't figure out how to change "2" into "1,1,0,0,0". Ideally the order of the 1s and 0s (within each site) would also be randomised (ie. "0,0,1,0,1"), but that might be too difficult.
Any help would be appreciated.
We can also use uncount
library(dplyr)
library(tidyr)
df %>%
uncount(max(species2), .remove = FALSE) %>%
group_by(site) %>%
mutate(across(starts_with('species'), ~ as.integer(row_number() <= first(.))))
# A tibble: 15 x 3
# Groups: site [3]
# site species1 species2
# <chr> <int> <int>
# 1 a 0 1
# 2 a 0 1
# 3 a 0 1
# 4 a 0 1
# 5 a 0 1
# 6 b 1 1
# 7 b 1 1
# 8 b 0 0
# 9 b 0 0
#10 b 0 0
#11 c 1 1
#12 c 0 1
#13 c 0 1
#14 c 0 1
#15 c 0 0
After repeating the rows you can compare the row number with any value of the respective column and assign 1 if the current row number is less than the value.
library(dplyr)
df %>%
slice(rep(seq_len(n()), each = 5)) %>%
group_by(site) %>%
mutate(across(starts_with('species'), ~+(row_number() <= first(.))))
#Use mutate_at with old dplyr
#mutate_at(vars(starts_with('species')), ~+(row_number() <= first(.)))
# site species1 species2
# <chr> <int> <int>
# 1 a 0 1
# 2 a 0 1
# 3 a 0 1
# 4 a 0 1
# 5 a 0 1
# 6 b 1 1
# 7 b 1 1
# 8 b 0 0
# 9 b 0 0
#10 b 0 0
#11 c 1 1
#12 c 0 1
#13 c 0 1
#14 c 0 1
#15 c 0 0

Summarize data table individually for multiple columns

I am trying to summarize data across multiple columns automatically if at all possible rather than writing code for each column independently. I would like to summarize this:
Patch Size Achmil Aciarv Aegpod Agrcap
A 10 0 1 1 0
B 2 1 0 0 0
C 2 1 0 0 0
D 2 1 0 0 0
into this
Species Presence MaxSize MeanSize Count
Achmil 0 10 10 1
Achmil 1 2 2 3
Aciarv 0 2 2 3
Aciarv 1 10 10 1
I know that I can individually run group_by and summarize for each column
achmil<-group_by(LimitArea, Achmil) %>%
summarise(SumA=mean(Size))
but is there no way to automatically run this for each column for each presence and absence using some sort of loop? Any help is appreciated.
Perhaps we need to gather in to long format and then do the summarise
library(tidyverse)
gather(df1, Species, Presence, Achmil:Agrcap) %>%
group_by(Species, Presence) %>%
summarise( MaxSize = max(Size), MeanSize = mean(Size), Count = n())
# A tibble: 7 x 5
# Groups: Species [?]
# Species Presence MaxSize MeanSize Count
# <chr> <int> <dbl> <dbl> <int>
#1 Achmil 0 10.0 10.0 1
#2 Achmil 1 2.00 2.00 3
#3 Aciarv 0 2.00 2.00 3
#4 Aciarv 1 10.0 10.0 1
#5 Aegpod 0 2.00 2.00 3
#6 Aegpod 1 10.0 10.0 1
#7 Agrcap 0 10.0 4.00 4
In the newer version of dplyr/tidyr, we can use pivot_longer
df1 %>%
pivot_longer(cols = Achmil:Agrcap, names_to = "Species",
values_to = "Presence") %>%
group_by(Species, Presence) %>%
summarise(MaxSize = max(Size), MeanSize = mean(Size), Count = n())
Here another solution using aggregate (and reshape2::melt())
library(reshape2)
df = melt(df[,2:ncol(df)], "Size")
aggregate(. ~ `variable`+`value`, data = df,
FUN = function(x) c(max = max(x), mean = mean(x), count = length(x)))
variable value Size.max Size.mean Size.count
1 Achmil 0 10 10 1
2 Aciarv 0 2 2 3
3 Aegpod 0 2 2 3
4 Agrcap 0 10 4 4
5 Achmil 1 2 2 3
6 Aciarv 1 10 10 1
7 Aegpod 1 10 10 1

Resources