I have an NCAA recruiting dataset that has 200+ teams from 1980-2022 of many different ranking metrics. I am trying to subset my data into individual tibbles/dataframes of each team over time. I thought of using a loop but can't wrap my head around how to code it.
2nd. I am trying to look for extreme changes from one season to the next to determine if a certain event cause an increase/decrease in the rank of recruits each team had. Someone correct me if my logic is wrong. Could I use z scores to determine this? And if so how would I do it?
to subset, I have tried this:
postCovid <- teamRecruitingScore %>% filter(yr>2018) %>% filter(topAvg != 0) %>% group_by(yr, topAvg)
and it returns this:
yr team tfsRk tfsSc…¹ rawRk rawAvg topRk topAvg total…² five four three
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2019 Akron 112 130. 120 79.2 125 79.2 23 0 0 9
2 2019 Alabama 1 317. 1 94.4 1 95.4 27 3 23 1
3 2019 Appalach… 99 139. 88 81.5 92 81.5 18 0 0 16
4 2019 Arizona 54 183. 58 84.8 58 84.8 20 0 1 19
What I am trying to get is each team, then list all the years.
Any ideas?
I have the folllowing dataset:
# A tibble: 90 × 2
decade n
<dbl> <int>
1 1930 13
2 1931 48
3 1932 44
4 1933 76
5 1934 73
6 1935 63
7 1936 54
8 1937 51
9 1938 41
10 1939 42
# … with 80 more rows
With more years that continue until 2010. I wish to "group" by decade like 1930s, 1940s, etc... and to have on another column the count of n in each year until the end of the decade.
For example:
# A tibble: 90 × 2
decade n
<dbl> <int>
1 1930-1939 449
2 1940-1949 516
Thanks!
We could use the modulo operator %%:
library(dplyr)
df %>%
group_by(decade = decade - decade %% 10) %>%
summarise(n = sum(n))
decade n
<dbl> <int>
1 1930 505
I am trying to replace all values in nat_locx with the value from the first row in LOCX if multiple conditions are met once or more for id (my group_by() variable).
Here is an example of my data:
id DATE nat_locx LOCX distance loc_age
<fct> <date> <dbl> <dbl> <dbl> <dbl>
6553 2004-06-27 13.5 2 487.90 26
6553 2004-07-14 13.5 13.5 0 43
6553 2004-07-15 13.5 12.5 30 44
6553 2004-07-25 13.5 14.5 44.598 54
6081 2004-07-05 13 14.2 40.249 44
6081 2004-07-20 13 13.8 61.847 49
The way I have tried to do this is like so:
df<-df %>%
group_by(id) %>%
mutate(nat_locx=ifelse(loc_age>25 & loc_age<40 & distance>30, first(LOCX), nat_locx))
However, when I do this, it only replaces the first row with the first value from the LOCX column instead of all the nat_locx values for my group_by variable (id).
Ideally, I'd like this output:
id DATE nat_locx LOCX distance loc_age
<fct> <date> <dbl> <dbl> <dbl> <dbl>
6553 2004-06-27 2 2 487.90 26
6553 2004-07-14 2 13.5 0 43
6553 2004-07-15 2 12.5 30 44
6553 2004-07-25 2 14.5 44.598 54
6081 2004-07-05 13 14.2 40.249 44
6081 2004-07-20 13 13.8 61.847 49
A dplyr solution is preferred.
We could use a classic non vectorized if else statement:
df %>%
group_by(id) %>%
mutate(nat_locx=if (loc_age > 25 &
loc_age < 40 &
distance > 30) {
first(LOCX)
} else {
nat_locx
}
)
id DATE nat_locx LOCX distance loc_age
<int> <chr> <dbl> <dbl> <dbl> <int>
1 6553 2004-06-27 2 2 488. 26
2 6553 2004-07-14 2 13.5 0 43
3 6553 2004-07-15 2 12.5 30 44
4 6553 2004-07-25 2 14.5 44.6 54
5 6081 2004-07-05 13 14.2 40.2 44
6 6081 2004-07-20 13 13.8 61.8 49
We may need replace
df %>%
group_by(id) %>%
mutate(nat_locx =
replace(nat_locx,
loc_age>25 & loc_age<40 & distance>30,
first(LOCX)))
I have a data frame delineated by ownership, private(50) and state(30). Looking to create 5 new rows that are the sum of ownership 50 and ownership 30 as long as they have a matching area value. Desired result is below.
naics <- c(611,611,611,611,611,611,611,611,611,611)
ownership <- c(50,50,50,50,50,30,30,30,30,10)
area <- c(001,003,005,009,011,001,003,005,011,001)
d200201 <- c(14,17,20,23,26,3,5,7,9,100)
d200202 <- c(15,18,21,24,28,9,11,13,15,105)
private <- data.frame(naics,ownership,area,d200201,d200202)
naics ownership area d200201 d200202
611 50 001 17 24
611 50 003 22 29
611 50 005 27 34
611 50 009 23 24 (no sum because no 30 value)
611 50 011 35 43
Is this what you are looking for?
library(dplyr)
private %>%
group_by(naics, area) %>%
summarize(
across(c(d200201, d200202), ~sum(.x[ownership %in% c(30, 50)])),
ownership = 50, .groups = "drop"
)
Output
# A tibble: 5 x 5
naics area d200201 d200202 ownership
<dbl> <dbl> <dbl> <dbl> <dbl>
1 611 1 17 24 50
2 611 3 22 29 50
3 611 5 27 34 50
4 611 9 23 24 50
5 611 11 35 43 50
library(tidyverse)
private %>%
filter(ownership %in% c(50, 30)) %>%
group_by(area) %>%
summarize(across(starts_with("d200"), sum))
#> # A tibble: 5 × 3
#> area d200201 d200202
#> <dbl> <dbl> <dbl>
#> 1 1 17 24
#> 2 3 22 29
#> 3 5 27 34
#> 4 9 23 24
#> 5 11 35 43
Created on 2022-01-08 by the reprex package (v2.0.1)
I need to calculate summary statistics for observations of bird breeding activity for each of 150 species. The data frame has the species (scodef), the type of observation (codef)(e.g. nest building), and the ordinal date (days since 1 January, since the data were collected over multiple years). Using dplyr I get exactly the result I want.
library(dplyr)
library(tidyr)
phenology %>% group_by(sCodef, codef) %>%
summarize(N=n(), Min=min(jdate), Max=max(jdate), Median=median(jdate))
# A tibble: 552 x 6
# Groups: sCodef [?]
sCodef codef N Min Max Median
<fct> <fct> <int> <dbl> <dbl> <dbl>
1 ABDU AY 3 172 184 181
2 ABDU FL 12 135 225 188
3 ACFL AY 18 165 222 195
4 ACFL CN 4 142 156 152.
5 ACFL FL 10 166 197 192.
6 ACFL NB 6 139 184 150.
7 ACFL NY 6 166 207 182
8 AMCO FL 1 220 220 220
9 AMCR AY 53 89 198 161
10 AMCR FL 78 133 225 166.
# ... with 542 more rows
How do I get these summary statistics into some sort of data object so that I can export them to use ultimately in a Word document? I have tried this and gotten an error. All of the many explanations of summarize I have reviewed just show the summary data on screen. Thanks
out3 <- summarize(N=n(), Min=min(jdate), Max=max(jdate), median=median(jdate))
Error: This function should not be called directly
Assign this to a variable, then write to a csv like so:
summarydf <- phenology %>% group_by......(as above)
write.csv(summarydf, filename="yourfilenamehere.csv")