All data in a row. Move observations under each other - r

I have the following data:
data <- list(list(eventId = 8, subEventName = "Simple pass", tags = list(
list(id = 1801)), playerId = 122671, positions = list(list(
y = 50, x = 50), list(y = 53, x = 35)), matchId = 2057954,
eventName = "Pass", teamId = 16521, matchPeriod = "1H", eventSec = 1.656214,
subEventId = 85, id = 258612104), list(eventId = 8, subEventName = "High pass",
tags = list(list(id = 1801)), playerId = 139393, positions = list(
list(y = 53, x = 35), list(y = 19, x = 75)), matchId = 2057954,
eventName = "Pass", teamId = 16521, matchPeriod = "1H", eventSec = 4.487814,
subEventId = 83, id = 258612106))
I want to create a data frame out of this list. I use unlist(data), which creates a row with repeated variables.
> unlist(data)
eventId subEventName tags.id playerId positions.y positions.x positions.y
"8" "Simple pass" "1801" "122671" "50" "50" "53"
positions.x matchId eventName teamId matchPeriod eventSec subEventId
"35" "2057954" "Pass" "16521" "1H" "1.656214" "85"
id eventId subEventName tags.id playerId positions.y positions.x
"258612104" "8" "High pass" "1801" "139393" "53" "35"
positions.y positions.x matchId eventName teamId matchPeriod eventSec
"19" "75" "2057954" "Pass" "16521" "1H" "4.487814"
subEventId id
"83" "258612106"
Each observation starts with the eventId variable. So, basically I have to split the data into dataframes starting with eventId, and then moving those dataframes one under the other. I.e. having two observations in this case. Do you have any idea? thanks in advance

Try tibblify--
library(tibblify)
tibblify(data)
## A tibble: 2 x 12
# eventId subEventName tags playerId positions matchId eventName teamId matchPeriod eventSec subEventId id
# <dbl> <chr> <list<tbl_df[,1]>> <dbl> <list<tbl_df[,2]>> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
#1 8 Simple pass [1 x 1] 122671 [2 x 2] 2057954 Pass 16521 1H 1.66 85 258612104
#2 8 High pass [1 x 1] 139393 [2 x 2] 2057954 Pass 16521 1H 4.49 83 258612106

You can use rbindlist from data.table :
result <- data.table::rbindlist(data)
result
# eventId subEventName tags playerId positions matchId eventName teamId matchPeriod eventSec subEventId id
#1: 8 Simple pass <list[1]> 122671 <list[2]> 2057954 Pass 16521 1H 1.66 85 2.59e+08
#2: 8 Simple pass <list[1]> 122671 <list[2]> 2057954 Pass 16521 1H 1.66 85 2.59e+08
#3: 8 High pass <list[1]> 139393 <list[2]> 2057954 Pass 16521 1H 4.49 83 2.59e+08
#4: 8 High pass <list[1]> 139393 <list[2]> 2057954 Pass 16521 1H 4.49 83 2.59e+08

Does this help solve your problem?
df <- data.frame(matrix(unlist(data), nrow=length(data), byrow=T))

Related

How to read files in two separate lists in a function based on a condition in R

Okay, I hope I manage to sum up what I need to achieve. I am running experiments in which I obtain data from two different source, with a date_time being the matching unifying variable. The data in the two separate sources have the same structure (in csv or txt). The distinction is in the filenames. I provide an example below:
list_of_files <- structure(
list
(
solid_epoxy1_10 = data.frame(
date_time = c("20/07/2022 13:46",
"20/07/2022 13:56",
"20/07/2022 14:06"),
frequency = c("30000",
"31000",
"32000"),
index = c("1", "2", "3")
),
solid_otherpaint_20 = data.frame(
date_time = c("20/07/2022 13:10",
"20/07/2022 13:20",
"20/07/2022 14:30"),
frequency = c("20000",
"21000",
"22000"),
index = c("1", "2", "3")
),
water_epoxy1_10 = data.frame(
date_time = c("20/07/2022 13:46",
"20/07/2022 13:56",
"20/07/2022 14:06"),
temperature = c("22.3",
"22.6",
"22.5")
),
water_otherpaint_20 = data.frame(
date_time = c("20/07/2022 13:10",
"20/07/2022 13:20",
"20/07/2022 14:30"),
temperature = c("24.5",
"24.6",
"24.8")
)
)
)
First I want to read the data files in two separate lists. One that contains the keyword "solid" in the file name, and the other one that contains "water".
Then I need to create a new columns from the filename in each data frame that will be separated by "_" (e.g paint = "epox1", thickness = "10"), by which I could do an inner join by the date_time column, paint, thickness,etc. Basically what I struggle so far is to create a function that loads that files in two separate lists. This is what I've tried so far
load_files <-
function(list_of_files) {
all.files.board <- list()
all.files.temp <- list()
for (i in 1:length(list_of_files))
{
if (exists("board")) {
all.files.board[[i]] = fread(list_of_files[i])
}
else{
all.files.temp[[i]] = fread(list_of_files[i])
}
return(list(all.files.board, all.files.temp))
}
}
But it doesn't do what I need it. I hope I made it as clear as possible. I'm pretty comfortable with the tidyverse package but writing still a newbie in writing custom functions. Any ideas welcomed.
Regarding question in the title -
first issue, calling return() too early and thus breaking a for-loop, was already mentioned in comments and that should be sorted.
next one is condition itself, if (exists("board")){} checks if there is an object called board; in provided sample it would evaluate to TRUE only if something was assigned to global board object before calling load_files() function and it would evaluate to FALSE only if there were no such assignment or board was explicitly removed. I.e. with
board <- "something"; dataframes <- load_files(file_list) that check will be TRUE while with
rm(board); dataframes <- load_files(file_list) it will be FALSE, there's nothing in function itself that would change the "existance" of board, so the result is actually determined before calling the function.
If core of the question is about joining 2 somewhat different datasets and splitting result by groups, I'd just drop loops, conditions and most of involved lists and would go with something like this with Tidyverse:
library(fs)
library(readr)
library(stringr)
library(dplyr)
library(tidyr)
# prepare input files for sample ------------------------------------------
sample_dfs <- structure(
list
(
solid_epoxy1_10 = data.frame(
date_time = c("20/07/2022 13:46", "20/07/2022 13:56", "20/07/2022 14:06"),
frequency = c("30000", "31000", "32000"),
index = c("1", "2", "3")
),
solid_otherpaint_20 = data.frame(
date_time = c("20/07/2022 13:10", "20/07/2022 13:20", "20/07/2022 14:30"),
frequency = c("20000", "21000", "22000"),
index = c("1", "2", "3")
),
water_epoxy1_10 = data.frame(
date_time = c("20/07/2022 13:46", "20/07/2022 13:56", "20/07/2022 14:06"),
temperature = c("22.3", "22.6", "22.5")
),
water_otherpaint_20 = data.frame(
date_time = c("20/07/2022 13:10", "20/07/2022 13:20", "20/07/2022 14:30"),
temperature = c("24.5", "24.6", "24.8")
)
)
)
tmp_path <- file_temp("reprex")
dir_create(tmp_path)
sample_filenames <- str_glue("{1:length(sample_dfs)}_{names(sample_dfs)}.csv")
for (i in seq_along(sample_dfs)) {
write_csv(sample_dfs[[i]], path(tmp_path, sample_filenames[i]))
}
dir_ls(tmp_path, type = "file")
#> Temp/RtmpqUoct8/reprex5cc517f177b/1_solid_epoxy1_10.csv
#> Temp/RtmpqUoct8/reprex5cc517f177b/2_solid_otherpaint_20.csv
#> Temp/RtmpqUoct8/reprex5cc517f177b/3_water_epoxy1_10.csv
#> Temp/RtmpqUoct8/reprex5cc517f177b/4_water_otherpaint_20.csv
# read files --------------------------------------------------------------
t_solid <- dir_ls(tmp_path, glob = "*solid*.csv", type = "file") %>%
read_csv(id = "filename") %>%
extract(filename, c("paint", "thickness"), "_([^_]+)_(\\d+)\\.csv")
t_solid
#> # A tibble: 6 × 5
#> paint thickness date_time frequency index
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 epoxy1 10 20/07/2022 13:46 30000 1
#> 2 epoxy1 10 20/07/2022 13:56 31000 2
#> 3 epoxy1 10 20/07/2022 14:06 32000 3
#> 4 otherpaint 20 20/07/2022 13:10 20000 1
#> 5 otherpaint 20 20/07/2022 13:20 21000 2
#> 6 otherpaint 20 20/07/2022 14:30 22000 3
t_water <- dir_ls(tmp_path, glob = "*water*.csv", type = "file") %>%
read_csv(id = "filename") %>%
extract(filename, c("paint", "thickness"), "_([^_]+)_(\\d+)\\.csv")
t_water
#> # A tibble: 6 × 4
#> paint thickness date_time temperature
#> <chr> <chr> <chr> <dbl>
#> 1 epoxy1 10 20/07/2022 13:46 22.3
#> 2 epoxy1 10 20/07/2022 13:56 22.6
#> 3 epoxy1 10 20/07/2022 14:06 22.5
#> 4 otherpaint 20 20/07/2022 13:10 24.5
#> 5 otherpaint 20 20/07/2022 13:20 24.6
#> 6 otherpaint 20 20/07/2022 14:30 24.8
# or implement as a function ----------------------------------------------
load_files <- function(csv_path, glob = "*.csv") {
return(
dir_ls(csv_path, glob = glob, type = "file") %>%
# store filenames in filename column
read_csv(id = "filename", show_col_types = FALSE) %>%
# extract each regex group to its own column
extract(filename, c("paint", "thickness"), "_([^_]+)_(\\d+)\\.csv"))
}
# join / group / split ----------------------------------------------------
t_solid <- load_files(tmp_path, "*solid*.csv")
t_water <- load_files(tmp_path, "*water*.csv")
# either join by multiple columns or select only required cols
# to avoid x.* & y.* columns in result
inner_join(t_solid, t_water, by = c("date_time", "paint", "thickness")) %>%
group_by(paint) %>%
group_split()
Final result as a list of tibbles:
#> <list_of<
#> tbl_df<
#> paint : character
#> thickness : character
#> date_time : character
#> frequency : double
#> index : double
#> temperature: double
#> >
#> >[2]>
#> [[1]]
#> # A tibble: 3 × 6
#> paint thickness date_time frequency index temperature
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 epoxy1 10 20/07/2022 13:46 30000 1 22.3
#> 2 epoxy1 10 20/07/2022 13:56 31000 2 22.6
#> 3 epoxy1 10 20/07/2022 14:06 32000 3 22.5
#>
#> [[2]]
#> # A tibble: 3 × 6
#> paint thickness date_time frequency index temperature
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 otherpaint 20 20/07/2022 13:10 20000 1 24.5
#> 2 otherpaint 20 20/07/2022 13:20 21000 2 24.6
#> 3 otherpaint 20 20/07/2022 14:30 22000 3 24.8

Get nearest n matching strings

Hi I am trying to match one string from other string in different dataframe and get nearest n matches based on score.
EX: from string_2 (df_2) column i need to match with string_1(df_1) and get the nearest 3 matches based on each ID group.
ID = c(100, 100,100,100,103,103,103,103,104,104,104,104)
string_1 = c("Jack Daniel","Jac","JackDan","steve","Mark","Dukes","Allan","Duke","Puma Nike","Puma","Nike","Addidas")
df_1 = data.frame(ID,string_1)
ID = c(100, 100, 185, 103,103, 104, 104,104)
string_2 = c("Jack Daniel","Mark","Order","Steve","Mark 2","Nike","Addidas","Reebok")
df_2 = data.frame(ID,string_2)
My output dataframe df_out will look like below.
ID = c(100, 100,185,103,103,104,104,104)
string_2 = c("Jack Daniel","Mark","Order","Steve","Mark 2","Nike","Addidas","Reebok")
nearest_str_match_1 = c("Jack Daniel","JackDan","NA","Duke","Mark","Nike","Addidas","Nike")
nearest_str_match_2 =c("JackDan","Jack Daniel","NA","Dukes","Duke","Addidas","Nike","Puma Nike")
nearest_str_match_3 =c("Jac","Jac","NA","Allan","Allan","Puma","Puma","Addidas")
df_out = data.frame(ID,string_2,nearest_str_match_1,nearest_str_match_2,nearest_str_match_3)
i have tried manually with package "stringdist" - 'jw' method and get the nearest value.
stringdist::stringdist("Jack Daniel","Jack Daniel","jw")
stringdist::stringdist("Jack Daniel","Jac","jw")
stringdist::stringdist("Jack Daniel","JackDan","jw")
Thanks in advance
merge(df_1, df_2, by = 'ID') %>%
group_by(string_2) %>%
mutate(dist = (stringdist::stringdist(string_2,string_1, 'jw')) %>%
rank(ties = 'last')) %>%
slice_min(dist, n = 3) %>%
pivot_wider(names_from = dist, names_prefix = 'nearest_str_match_',
values_from = string_1)
# A tibble: 7 x 5
# Groups: string_2 [7]
ID string_2 nearest_str_match_1 nearest_str_match_2 nearest_str_match_3
<dbl> <chr> <chr> <chr> <chr>
1 104 Addidas Addidas Nike Puma
2 100 Jack Daniel Jack Daniel JackDan Jac
3 100 Mark JackDan Jack Daniel Jac
4 103 Mark 2 Mark Duke Allan
5 104 Nike Nike Addidas Puma
6 104 Reebok Nike Puma Nike Addidas
7 103 Steve Duke Dukes Allan

R : Generate a dataframe from list of possible values

newdata <- as_tibble( # valid values shown below
dvcat="10-24", # "1-9" "10-24" "25-39" "40-54" "55+"
seatbelt="none", # "none" "belted"
frontal="frontal", # "notfrontal" "frontal"
sex="f", # "f" "m"
ageOFocc=22, # age in years, 16-97
yearVeh=2002, # year of vehicle, 1955-2003
airbag="none", # "none" "airbag"
occRole="pass" # "driver" "pass"
)
dvcat seatbelt frontal sex ageOFocc yearVeh airbag occRole
1 10-24 none frontal f 22 2002 none pass
I want to generate the possible combination of the variables above and put them into a tibble dataframe.
For example, let's say I want to have a dataset with 3 rows. Randomly the value will be pick to create a new row.
dvcat seatbelt frontal sex ageOFocc yearVeh airbag occRole
1 10-24 none frontal f 22 2002 none pass
2 25-39 none frontal m 54 2010 none drive
3 40-54 belted frontal f 14 2016 airbag driver
If we have a list of values to pick, then use
library(purrr)
map_dfr(lst1, ~ sample(.x, 3, replace = TRUE))
# A tibble: 3 x 8
# dvcat seatbelt frontal sex ageOFocc yearVeh airbag occRole
# <chr> <chr> <chr> <chr> <int> <int> <chr> <chr>
#1 40-54 none notfrontal f 71 1997 none driver
#2 40-54 none frontal m 87 1974 airbag driver
#3 25-39 belted notfrontal m 56 2001 none driver
Or in base R
data.frame(lapply(lst1, sample, size = 3, replace = TRUE))
data
lst1 <- list(dvcat = c("1-9", "10-24", "25-39", "40-54", "55+"),
seatbelt = c("none",
"belted"), frontal = c("notfrontal", "frontal"), sex = c("f",
"m"), ageOFocc = 16:97, yearVeh = 1955:2003, airbag = c("none",
"airbag"), occRole = c("driver", "pass"))

R How to summarize two different groups after initial group_by

I have the following I would like to do in one go instead of making two different results and doing a union:
delivery_stats= data.frame(service=c("UberEats", "Seamless","UberEats", "Seamless"),
status = c("OnTime", "OnTime", "Late", "Late"),
totals = c(235, 488, 32, 58))
ds1 = filter(delivery_stats, service =="UberEats") %>%
group_by(service, status) %>%
summarise(count_status = sum(totals)) %>%
mutate(avg_of_status = count_status/sum(count_status))
#now do the same for Seamless, then union...
Provided I have understood you correctly, do you mean this?
delivery_stats %>%
group_by(service) %>%
mutate(n = sum(totals)) %>%
transmute(
status,
count_status = totals,
avg_of_status = count_status/n)
## A tibble: 4 x 4
## Groups: service, status [4]
# service status count_status avg_of_status
# <fct> <fct> <dbl> <dbl>
#1 UberEats OnTime 235 0.880
#2 Seamless OnTime 488 0.894
#3 UberEats Late 32 0.120
#4 Seamless Late 58 0.106
Explanation: First group by service to calculate the sum of totals by service; then group by service and status to calculate the mean (across service) of count_status = totals.
You also try base R using ave with the help of within.
res <- within(delivery_stats, {
count_status <- ave(totals, service, status, FUN=mean)
avg_of_status <- count_status / ave(totals, service, FUN=sum)
})
res
# service status totals avg_of_status count_status
# 1 UberEats OnTime 235 0.8801498 235
# 2 Seamless OnTime 488 0.8937729 488
# 3 UberEats Late 32 0.1198502 32
# 4 Seamless Late 58 0.1062271 58
As said above, I didn't have to filter and it would have worked fine for both groups:
delivery_stats= data.frame(service=c("UberEats", "Seamless","UberEats", "Seamless"),
status = c("OnTime", "OnTime", "Late", "Late"),
totals = c(235, 488, 32, 58))
ds1 = group_by(delivery_stats, service, status) %>%
summarise(count_status = sum(totals)) %>%
mutate(avg_of_status = count_status/sum(count_status))
# A tibble: 4 x 4
# Groups: service [2]
service status count_status avg_of_status
<fct> <fct> <dbl> <dbl>
1 Seamless Late 58 0.106
2 Seamless OnTime 488 0.894
3 UberEats Late 32 0.120
4 UberEats OnTime 235 0.880

Using dplyr's summarise on single column, but with multiple parameter values

Apologize for the not-so-clear title (could use help) - hopefully the example below will clarify many things. I have the following dataframe of basketball shot results (1 row == 1 basketball shot):
> dput(zed)
structure(list(shooterTeamAlias = c("DUKE", "DUKE", "BC", "DUKE",
"DUKE", "DUKE", "DUKE", "DUKE", "DUKE", "BC", "BC", "BC", "DUKE",
"BC", "BC", "DUKE", "DUKE", "DUKE", "BC", "DUKE"), distanceCategory = c("sht2",
"sht2", "sht3", "atr2", "mid2", "sht2", "lng3", "sht3", "atr2",
"sht3", "sht3", "sht2", "mid2", "sht3", "sht3", "sht3", "atr2",
"atr2", "sht2", "mid2"), eventType = c("twopointmiss", "twopointmade",
"threepointmade", "twopointmade", "twopointmiss", "twopointmade",
"threepointmiss", "threepointmiss", "twopointmade", "threepointmiss",
"threepointmade", "twopointmiss", "twopointmade", "threepointmiss",
"threepointmade", "threepointmiss", "twopointmade", "twopointmade",
"twopointmade", "twopointmade")), row.names = c(NA, 20L), class = "data.frame")
> zed
shooterTeamAlias distanceCategory eventType
1 DUKE sht2 twopointmiss
2 DUKE sht2 twopointmade
3 BC sht3 threepointmade
4 DUKE atr2 twopointmade
5 DUKE mid2 twopointmiss
6 DUKE sht2 twopointmade
7 DUKE lng3 threepointmiss
8 DUKE sht3 threepointmiss
9 DUKE atr2 twopointmade
10 BC sht3 threepointmiss
11 BC sht3 threepointmade
12 BC sht2 twopointmiss
13 DUKE mid2 twopointmade
14 BC sht3 threepointmiss
15 BC sht3 threepointmade
16 DUKE sht3 threepointmiss
17 DUKE atr2 twopointmade
18 DUKE atr2 twopointmade
19 BC sht2 twopointmade
20 DUKE mid2 twopointmade
This dataframe is currently in a tidy-ish format, and I need to group_by team and then fatten it big time. The full data has 6 distanceCategories atr2, sht2, mid2, lng2, sht3, lng3 (example above has 5 only), as well as 2 categories that are a function of the other 6: all2 is atr2, sht2, lng2, mid2 and all3 is sht3, lng3. For each of these 8 categories then, I would like a column for makes, attempts, pct, and attempt frequency. I use the eventType column to determine if a shot was made. I am currently doing so with the following
fat.data <- {zed %>%
dplyr::group_by(shooterTeamAlias) %>%
dplyr::summarise(
shotsCount = n(),
# Shooting By Distance Stats
atr2Made = sum(distanceCategory == "atr2" & eventType == "twopointmade"),
atr2Att = sum(distanceCategory == "atr2" & eventType %in% c("twopointmiss", "twopointmade")),
atr2AttFreq = atr2Att / shotsCount,
atr2Pct = ifelse(atr2Att > 0, atr2Made / atr2Att, 0),
sht2Made = sum(distanceCategory == "sht2" & eventType == "twopointmade"),
sht2Att = sum(distanceCategory == "sht2" & eventType %in% c("twopointmiss", "twopointmade")),
sht2AttFreq = sht2Att / shotsCount,
sht2Pct = ifelse(sht2Att > 0, sht2Made / sht2Att, 0),
mid2Made = sum(distanceCategory == "mid2" & eventType == "twopointmade"),
mid2Att = sum(distanceCategory == "mid2" & eventType %in% c("twopointmiss", "twopointmade")),
mid2AttFreq = mid2Att / shotsCount,
mid2Pct = ifelse(mid2Att > 0, mid2Made / mid2Att, 0),
lng2Made = sum(distanceCategory == "lng2" & eventType == "twopointmade"),
lng2Att = sum(distanceCategory == "lng2" & eventType %in% c("twopointmiss", "twopointmade")),
lng2AttFreq = lng2Att / shotsCount,
lng2Pct = ifelse(lng2Att > 0, lng2Made / lng2Att, 0),
all2Made = sum(atr2Made, sht2Made, mid2Made, lng2Made),
all2Att = sum(atr2Att, sht2Att, mid2Att, lng2Att),
all2AttFreq = all2Att / shotsCount,
all2Pct = ifelse(all2Att > 0, all2Made / all2Att, 0),
sht3Made = sum(distanceCategory == "sht3" & eventType == "threepointmade"),
sht3Att = sum(distanceCategory == "sht3" & eventType %in% c("threepointmiss", "threepointmade")),
sht3AttFreq = sht3Att / shotsCount,
sht3Pct = ifelse(sht3Att > 0, sht3Made / sht3Att, 0),
lng3Made = sum(distanceCategory == "lng3" & eventType == "threepointmade"),
lng3Att = sum(distanceCategory == "lng3" & eventType %in% c("threepointmiss", "threepointmade")),
lng3AttFreq = lng3Att / shotsCount,
lng3Pct = ifelse(lng3Att > 0, lng3Made / lng3Att, 0),
all3Made = sum(sht3Made, lng3Made),
all3Att = sum(sht3Att, lng3Att),
all3AttFreq = all3Att / shotsCount,
all3Pct = ifelse(all3Att > 0, all3Made / all3Att, 0))}
...for the 6 categories that appear in the data (all but all2 and all3), their 4 columns are all computed in the same manner. As you'll see for all2 and all3, the calculations are a bit different.
Not worrying for the time being about the all2 and all3 categories, is there a better way to compute the makes, attempts, pct, and attempt frequencies for the 6 categories in the data? For the 8 categories * 4 column-types == 32 columns here, it's not so bad, but I have another, similar instance where I have 21 categories * 4 column-types, and I have to do this multiple times in my code.
Not sure if dplyr::group_by dplyr::summarise is my best option (obv it's what im using currently), or if there's a better way to go about this. Improving this code / potentially speeding it up for my project is pivotally important, and any help is appreciated / i'll try to remember to bounty this post even if answered in the next 2 days.
Edit !!! : I've just realized that grouping by the distanceCategory first, computing the 4 stats for each distanceCategory, and then re-structuring that dataframe into this fat format may be easier... it is something I'm working on computing currently. Something along these lines:
zed %>%
dplyr::group_by(shooterTeamAlias, distanceCategory) %>%
dplyr::summarise(
attempts = ...,
makes = ...,
pct = ...,
attfreq = ...
) %>%
tidyr::spread(...)
Thanks!!
This looks like it could be made simpler by grouping by distanceCategory and then applying the same logic to each:
library(tidyverse)
zed %>%
group_by(shooterTeamAlias, distanceCategory) %>%
summarize(att = n(), # n() counts how many rows in this group
made = sum(eventType %>% str_detect("made"))
pct = if_else(att > 0, made / att, 0)) %>%
mutate(freq = att / sum(att))
# A tibble: 7 x 6
# Groups: shooterTeamAlias [2]
shooterTeamAlias distanceCategory att made pct freq
<chr> <chr> <int> <int> <dbl> <dbl>
1 BC sht2 2 1 0.5 0.286
2 BC sht3 5 3 0.6 0.714
3 DUKE atr2 4 4 1 0.308
4 DUKE lng3 1 0 0 0.0769
5 DUKE mid2 3 2 0.667 0.231
6 DUKE sht2 3 2 0.667 0.231
7 DUKE sht3 2 0 0 0.154
If you want that in wide format, you could first gather the calculations above, unite the distance with the stat, and then spread by that:
[same code as above] %>%
gather(stat, value, -distanceCategory, -shooterTeamAlias) %>%
unite(stat, distanceCategory, stat) %>%
spread(stat, value)
# A tibble: 2 x 21
# Groups: shooterTeamAlias [2]
shooterTeamAlias atr2_att atr2_freq atr2_made atr2_pct lng3_att lng3_freq lng3_made lng3_pct mid2_att mid2_freq mid2_made mid2_pct sht2_att sht2_freq sht2_made sht2_pct sht3_att sht3_freq sht3_made sht3_pct
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 BC NA NA NA NA NA NA NA NA NA NA NA NA 2 0.286 1 0.5 5 0.714 3 0.6
2 DUKE 4 0.308 4 1 1 0.0769 0 0 3 0.231 2 0.667 3 0.231 2 0.667 2 0.154 0 0

Resources