I want summarise a data frame [duplicate] - r

This question already has answers here:
count number of rows in a data frame in R based on group [duplicate]
(8 answers)
Closed 1 year ago.
I want summarize the following data frame to a summary table.
plot <- c(rep(1,2), rep(2,4), rep(3,3))
bird <- c('a','b', 'a','b', 'c', 'd', 'a', 'b', 'c')
area <- c(rep(10,2), rep(5,4), rep(15,3))
birdlist <- data.frame(plot,bird,area)
birdlist
plot bird area
1 1 a 10
2 1 b 10
3 2 a 5
4 2 b 5
5 2 c 5
6 2 d 5
7 3 a 15
8 3 b 15
9 3 c 15
I tried the following
birdlist %>%
group_by(plot, area) %>%
mutate(count(bird))
I am trying to get a data frame as result that looks like the following
plot bird area
1 2 10
2 4 5
3 3 15
Please help/advice on how to count bird with reference to plot and respective area of the plot. Thanks.

You were very close, you want summarize instead of mutate though and you can use n() to count the number of rows within the group you're specifying.
library(tidyverse)
birdlist %>%
group_by(plot, area) %>%
summarize(bird = n(),
.groups = "drop")
#> # A tibble: 3 x 3
#> plot area bird
#> <dbl> <dbl> <int>
#> 1 1 10 2
#> 2 2 5 4
#> 3 3 15 3
If you're set on count, you would use it without group_by.
birdlist %>%
count(plot, area, name = "bird")

We could group_by plot and summarise using unique():
birdlist %>%
group_by(plot) %>%
summarise(bird = n(), area = unique(area))
plot bird area
<dbl> <int> <dbl>
1 1 2 10
2 2 4 5
3 3 3 15

Related

R data imputation from group_by table [duplicate]

This question already has answers here:
How to replace NA with mean by group / subset?
(5 answers)
Closed 7 months ago.
group = c(1,1,4,4,4,5,5,6,1,4,6)
animal = c('a','b','c','c','d','a','b','c','b','d','c')
sleep = c(14,NA,22,15,NA,96,100,NA,50,2,1)
test = data.frame(group, animal, sleep)
print(test)
group_animal = test %>% group_by(`group`, `animal`) %>% summarise(mean_sleep = mean(sleep, na.rm = T))
I would like to replace the NA values the sleep column based on the mean sleep value grouped by group and animal.
Is there any way that I can perform some sort of lookup like Excel that matches group and animal from the test dataframe to the group_animal dataframe and replaces the NA value in the sleep column from the test df with the sleep value in the group_animal df?
We could use mutate instead of summarise as summarise returns a single row per group
library(dplyr)
library(tidyr)
test <- test %>%
group_by(group, animal) %>%
mutate(sleep = replace_na(sleep, mean(sleep, na.rm = TRUE))) %>%
ungroup
-output
test
# A tibble: 11 × 3
group animal sleep
<dbl> <chr> <dbl>
1 1 a 14
2 1 b 50
3 4 c 22
4 4 c 15
5 4 d 2
6 5 a 96
7 5 b 100
8 6 c 1
9 1 b 50
10 4 d 2
11 6 c 1

Stepwise column sum in data frame based on another column in R

I have a data frame like this:
Team
GF
A
3
B
5
A
2
A
3
B
1
B
6
Looking for output like this (just an additional column):
Team
x
avg(X)
A
3
0
B
5
0
A
2
3
A
3
2.5
B
1
5
B
6
3
avg(x) is the average of all previous instances of x where Team is the same. I have the following R code which gets the overall average, however I'm looking for the "step-wise" average.
new_df <- df %>% group_by(Team) %>% summarise(avg_x = mean(x))
Is there a way to vectorize this while only evaluating the previous rows on each "iteration"?
You want the cummean() function from dplyr, combined with lag():
df %>% group_by(Team) %>% mutate(avg_x = replace_na(lag(cummean(x)), 0))
Producing the following:
# A tibble: 6 × 3
# Groups: Team [2]
Team x avg_x
<chr> <dbl> <dbl>
1 A 3 0
2 B 5 0
3 A 2 3
4 A 3 2.5
5 B 1 5
6 B 6 3
As required.
Edit 1:
As #Ritchie Sacramento pointed out, the following is cleaner and clearer:
df %>% group_by(Team) %>% mutate(avg_x = lag(cummean(x), default = 0))

Is there a way to count occurrence within a group in R? [duplicate]

This question already has answers here:
r Group by and count
(3 answers)
count number of rows in a data frame in R based on group [duplicate]
(8 answers)
Closed 1 year ago.
I have list of people grouped by their counties and by villages. I would like to count the number of villages in the respective counties. I am able to count the number of people in each county.
library(dplyr)
set.seed(123)
df <- data.frame(
person = 1:100,
county = round(runif(100, 1, 5)),
village = round(runif(100, 1, 10))
)
# Number of people per county
df %>% count(county )
library(dplyr)
df %>%
group_by(county) %>%
add_count(village)
output:
person county village n
<int> <dbl> <dbl> <int>
1 1 2 6 4
2 2 4 4 8
3 3 3 5 5
4 4 5 10 1
5 5 5 5 3
6 6 1 9 2
7 7 3 9 1
8 8 5 6 2
9 9 3 5 5
10 10 3 2 6
# ... with 90 more rows
Would that work for you Moses
df %>% group_by(county) %>%
count(village,county)

R add rows to grouped df using dplyr

I have a grouped df and I would like to add additional rows to the top of the groups that match with a variable (item_code) from the df.
The additional rows do not have an id column. The additional rows should not be duplicated within the groups of df.
Example data:
df <- as.tibble(data.frame(id=rep(1:3,each=2),
item_code=c("A","A","B","B","B","Z"),
score=rep(1,6)))
additional_rows <- as.tibble(data.frame(item_code=c("A","Z"),
score=c(6,6)))
What I tried
I found this post and tried to apply it:
Add row in each group using dplyr and add_row()
df %>% group_by(id) %>% do(add_row(additional_rows %>%
filter(item_code %in% .$item_code)))
What I get:
# A tibble: 9 x 3
# Groups: id [3]
id item_code score
<int> <fct> <dbl>
1 1 A 6
2 1 Z 6
3 1 NA NA
4 2 A 6
5 2 Z 6
6 2 NA NA
7 3 A 6
8 3 Z 6
9 3 NA NA
What I am looking for:
# A tibble: 6 x 3
id item_code score
<int> <fct> <dbl>
1 1 A 6
2 1 A 1
3 1 A 1
4 2 B 1
5 2 B 1
6 3 B 1
7 3 Z 6
8 3 Z 1
This should do the trick:
library(plyr)
df %>%
join(subset(df, item_code %in% additional_rows$item_code, select = c(id, item_code)) %>%
join(additional_rows) %>%
subset(!duplicated(.)), type = "full") %>%
arrange(id, item_code, -score)
Not sure if its the best way, but it works
Edit: to get the score in the same order added the other arrange terms
Edit 2: alright, there should now be no duplicated rows added from the additional rows as per your comment

How to combine data points in a data frame in R?

The data frame x has a column in which the values are periodic. For each unique value in that column, I want to calculate summation of the second column. If x is something like this:
x <- data.frame(a=c(1:2,1:2,1:2),b=c(1,4,5,2,3,4))
a b
1 1 1
2 2 4
3 1 5
4 2 2
5 1 3
6 2 4
The output I want is the following data frame:
a b
1 9
2 10
Using aggregate as follows will get you your desired result
aggregate(b ~ a, x, sum)
Here is the option with dplyr
library(dplyr)
x %>%
group_by(a) %>%
summarise(b = sum(b))
# A tibble: 2 x 2
# a b
# <int> <dbl>
#1 1 9.00
#2 2 10.0

Resources