Output function to create multiple dataframes by subsetted row

Output function to create multiple dataframes by subsetted row - r

I am trying to create multiple DFs from a function with each DF being the aggregate of up until varying row values. For your reference I am using fantasy football data. So right now I have each players stats for every week. I want to create a data frame for each week and their cumulative stats until that week.
Here is my function that I currently am using which only creates one list of aggregating the week 17 values.
sumuptopoint <- function(dfx,i) { listofdfs <- list()
dfy <- dfx[, !sapply(dfx, is.character)]
{for (i in 1:17)
dft <- dfy[dfy$Week < i,]
y <<- as.data.frame(aggregate(dft, list("PlayerID" = dft$PlayerID), sum))
listofdfs[[i]] <- y}
return(listofdfs)}
I expect 17 lists of aggregated data but am only get 1 list where 17 weeks prior to 17 are aggregated
Here is the df:
Team ByeWeek Rank.all PlayerID Name Position Week Opponent PassingCompletio~ PassingAttempts.~ PassingCompletio~ PassingYards.all PassingTouchdow~ PassingIntercep~ PassingRating.a~
<chr> <int> <int> <int> <chr> <chr> <dbl> <chr> <int> <int> <dbl> <int> <int> <int> <dbl>
1 ARI 12 201 19763 Josh ~ QB 8.00 SF 23 40 57.5 252 2 1 82.5
2 ARI 12 319 19763 Josh ~ QB 11.0 OAK 9 20 45.0 136 3 2 67.9
3 ARI 12 372 19763 Josh ~ QB 4.00 SEA 15 27 55.6 180 1 0 88.5
4 ARI 12 392 11527 Sam B~ QB 3.00 CHI 13 19 68.4 157 2 2 89.0
5 ARI 12 407 19763 Josh ~ QB 5.00 SF 10 25 40.0 170 1 0 77.1
6 ARI 12 411 19763 Josh ~ QB 10.0 KC 22 39 56.4 208 1 2 58.5

Related

How to Subset dataframe in R by entity of multiple entities over time

I have an NCAA recruiting dataset that has 200+ teams from 1980-2022 of many different ranking metrics. I am trying to subset my data into individual tibbles/dataframes of each team over time. I thought of using a loop but can't wrap my head around how to code it.
2nd. I am trying to look for extreme changes from one season to the next to determine if a certain event cause an increase/decrease in the rank of recruits each team had. Someone correct me if my logic is wrong. Could I use z scores to determine this? And if so how would I do it?
to subset, I have tried this:
postCovid <- teamRecruitingScore %>% filter(yr>2018) %>% filter(topAvg != 0) %>% group_by(yr, topAvg)
and it returns this:
yr team tfsRk tfsSc…¹ rawRk rawAvg topRk topAvg total…² five four three
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2019 Akron 112 130. 120 79.2 125 79.2 23 0 0 9
2 2019 Alabama 1 317. 1 94.4 1 95.4 27 3 23 1
3 2019 Appalach… 99 139. 88 81.5 92 81.5 18 0 0 16
4 2019 Arizona 54 183. 58 84.8 58 84.8 20 0 1 19
What I am trying to get is each team, then list all the years.
Any ideas?

How can I group and count observations made by decade in a dataset that has the years set as individual values?

I have the folllowing dataset:
# A tibble: 90 × 2
decade n
<dbl> <int>
1 1930 13
2 1931 48
3 1932 44
4 1933 76
5 1934 73
6 1935 63
7 1936 54
8 1937 51
9 1938 41
10 1939 42
# … with 80 more rows
With more years that continue until 2010. I wish to "group" by decade like 1930s, 1940s, etc... and to have on another column the count of n in each year until the end of the decade.
For example:
# A tibble: 90 × 2
decade n
<dbl> <int>
1 1930-1939 449
2 1940-1949 516
Thanks!

We could use the modulo operator %%:
library(dplyr)
df %>%
group_by(decade = decade - decade %% 10) %>%
summarise(n = sum(n))
decade n
<dbl> <int>
1 1930 505

Conditionally replace all records for group_by if condition is met once dplyr ifelse

I am trying to replace all values in nat_locx with the value from the first row in LOCX if multiple conditions are met once or more for id (my group_by() variable).
Here is an example of my data:
id DATE nat_locx LOCX distance loc_age
<fct> <date> <dbl> <dbl> <dbl> <dbl>
6553 2004-06-27 13.5 2 487.90 26
6553 2004-07-14 13.5 13.5 0 43
6553 2004-07-15 13.5 12.5 30 44
6553 2004-07-25 13.5 14.5 44.598 54
6081 2004-07-05 13 14.2 40.249 44
6081 2004-07-20 13 13.8 61.847 49
The way I have tried to do this is like so:
df<-df %>%
group_by(id) %>%
mutate(nat_locx=ifelse(loc_age>25 & loc_age<40 & distance>30, first(LOCX), nat_locx))
However, when I do this, it only replaces the first row with the first value from the LOCX column instead of all the nat_locx values for my group_by variable (id).
Ideally, I'd like this output:
id DATE nat_locx LOCX distance loc_age
<fct> <date> <dbl> <dbl> <dbl> <dbl>
6553 2004-06-27 2 2 487.90 26
6553 2004-07-14 2 13.5 0 43
6553 2004-07-15 2 12.5 30 44
6553 2004-07-25 2 14.5 44.598 54
6081 2004-07-05 13 14.2 40.249 44
6081 2004-07-20 13 13.8 61.847 49
A dplyr solution is preferred.

We could use a classic non vectorized if else statement:
df %>%
group_by(id) %>%
mutate(nat_locx=if (loc_age > 25 &
loc_age < 40 &
distance > 30) {
first(LOCX)
} else {
nat_locx
}
)
id DATE nat_locx LOCX distance loc_age
<int> <chr> <dbl> <dbl> <dbl> <int>
1 6553 2004-06-27 2 2 488. 26
2 6553 2004-07-14 2 13.5 0 43
3 6553 2004-07-15 2 12.5 30 44
4 6553 2004-07-25 2 14.5 44.6 54
5 6081 2004-07-05 13 14.2 40.2 44
6 6081 2004-07-20 13 13.8 61.8 49

We may need replace
df %>%
group_by(id) %>%
mutate(nat_locx =
replace(nat_locx,
loc_age>25 & loc_age<40 & distance>30,
first(LOCX)))

How to add two data frames together in R?

I have a data frame delineated by ownership, private(50) and state(30). Looking to create 5 new rows that are the sum of ownership 50 and ownership 30 as long as they have a matching area value. Desired result is below.
naics <- c(611,611,611,611,611,611,611,611,611,611)
ownership <- c(50,50,50,50,50,30,30,30,30,10)
area <- c(001,003,005,009,011,001,003,005,011,001)
d200201 <- c(14,17,20,23,26,3,5,7,9,100)
d200202 <- c(15,18,21,24,28,9,11,13,15,105)
private <- data.frame(naics,ownership,area,d200201,d200202)
naics ownership area d200201 d200202
611 50 001 17 24
611 50 003 22 29
611 50 005 27 34
611 50 009 23 24 (no sum because no 30 value)
611 50 011 35 43

Is this what you are looking for?
library(dplyr)
private %>%
group_by(naics, area) %>%
summarize(
across(c(d200201, d200202), ~sum(.x[ownership %in% c(30, 50)])),
ownership = 50, .groups = "drop"
)
Output
# A tibble: 5 x 5
naics area d200201 d200202 ownership
<dbl> <dbl> <dbl> <dbl> <dbl>
1 611 1 17 24 50
2 611 3 22 29 50
3 611 5 27 34 50
4 611 9 23 24 50
5 611 11 35 43 50

library(tidyverse)
private %>%
filter(ownership %in% c(50, 30)) %>%
group_by(area) %>%
summarize(across(starts_with("d200"), sum))
#> # A tibble: 5 × 3
#> area d200201 d200202
#> <dbl> <dbl> <dbl>
#> 1 1 17 24
#> 2 3 22 29
#> 3 5 27 34
#> 4 9 23 24
#> 5 11 35 43
Created on 2022-01-08 by the reprex package (v2.0.1)

dplyr summarize output - how to save it

I need to calculate summary statistics for observations of bird breeding activity for each of 150 species. The data frame has the species (scodef), the type of observation (codef)(e.g. nest building), and the ordinal date (days since 1 January, since the data were collected over multiple years). Using dplyr I get exactly the result I want.
library(dplyr)
library(tidyr)
phenology %>% group_by(sCodef, codef) %>%
summarize(N=n(), Min=min(jdate), Max=max(jdate), Median=median(jdate))
# A tibble: 552 x 6
# Groups: sCodef [?]
sCodef codef N Min Max Median
<fct> <fct> <int> <dbl> <dbl> <dbl>
1 ABDU AY 3 172 184 181
2 ABDU FL 12 135 225 188
3 ACFL AY 18 165 222 195
4 ACFL CN 4 142 156 152.
5 ACFL FL 10 166 197 192.
6 ACFL NB 6 139 184 150.
7 ACFL NY 6 166 207 182
8 AMCO FL 1 220 220 220
9 AMCR AY 53 89 198 161
10 AMCR FL 78 133 225 166.
# ... with 542 more rows
How do I get these summary statistics into some sort of data object so that I can export them to use ultimately in a Word document? I have tried this and gotten an error. All of the many explanations of summarize I have reviewed just show the summary data on screen. Thanks
out3 <- summarize(N=n(), Min=min(jdate), Max=max(jdate), median=median(jdate))
Error: This function should not be called directly

Assign this to a variable, then write to a csv like so:
summarydf <- phenology %>% group_by......(as above)
write.csv(summarydf, filename="yourfilenamehere.csv")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Output function to create multiple dataframes by subsetted row - r

Related

How to Subset dataframe in R by entity of multiple entities over time

How can I group and count observations made by decade in a dataset that has the years set as individual values?

Conditionally replace all records for group_by if condition is met once dplyr ifelse

How to add two data frames together in R?

dplyr summarize output - how to save it

Categories

Resources