grouping a character vector into new groups using dplyr

grouping a character vector into new groups using dplyr - r

I have a data frame that looks like this:
# A tibble: 5 x 5
# Groups: Trial [1]
GID Trial pop `1A-1145442` `1A-1158042`
<chr> <chr> <chr> <int> <int>
GID421213 ES1 ES1-5 12 11
GID419903 ES1 ES1-5 22 12
GID3881 ES1 ES1-5 22 22
GID13646 ES1 ES1-5 12 12
GID418846 ES1 ES1-5 22 11
Here is a dput of it :
structure(list(GID = c("GID421213", "GID419903", "GID3881", "GID13646",
"GID418846"), Trial = c("ES1", "ES1", "ES1", "ES1", "ES1"), pop = c("ES1-5",
"ES1-5", "ES1-5", "ES1-5", "ES1-5"), `1A-1145442` = c(12L, 22L,
22L, 12L, 22L), `1A-1158042` = c(11L, 12L, 22L, 12L, 11L)), row.names =
c(NA, -5L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), vars =
"Trial", drop = TRUE, indices = list(0:4), group_sizes = 5L,
biggest_group_size = 5L, labels = structure(list(Trial = "ES1"), row.names
= c(NA, -1L), class = "data.frame", vars = "Trial", drop = TRUE))
I want to perform a regrouping transformation into a new column from the Trial column just as I did in the past with the pop column using regex operations but now with dplyr. The Trial column consists of ES values from 1 to 38: I would like to group in this fashion ES1-3,ES3-6,ES7-9 and so forth using the dplyr package. I know I could start with df >%> group_by(df,Trial) but from there on I have no idea how I could operate.

library(dplyr)
df %>%
mutate(pop2 = case_when(
Trial == "ES1" | Trial == "ES2" | Trial == "ES3" ~ "ES1-3",
Trial == "ES4" | Trial == "ES5" | Trial == "ES6" ~ "ES4-6"
))
Will return
# A tibble: 5 x 6
# Groups: Trial [1]
GID Trial pop `1A-1145442` `1A-1158042` pop2
<chr> <chr> <chr> <int> <int> <chr>
1 GID421213 ES1 ES1-5 12 11 ES1-3
2 GID419903 ES1 ES1-5 22 12 ES1-3
3 GID3881 ES1 ES1-5 22 22 ES1-3
4 GID13646 ES1 ES1-5 12 12 ES1-3
5 GID418846 ES1 ES1-5 22 11 ES1-3

Given
(df <- data.frame(Trial = paste0("ES", 1:10)))
# Trial
# 1 ES1
# 2 ES2
# 3 ES3
# 4 ES4
# 5 ES5
# 6 ES6
# 7 ES7
# 8 ES8
# 9 ES9
# 10 ES10
We may, using base R, do
size <- 3
groups <- (as.numeric(substring(df$Trial, 3)) - 1) %/% size
(df$newCol <- sprintf("ES%d-%d", 1 + groups * size, size * (1 + groups)))
# [1] "ES1-3" "ES1-3" "ES1-3" "ES4-6" "ES4-6" "ES4-6" "ES7-9" "ES7-9"
# [9] "ES7-9" "ES10-12"
Here as.numeric(substring(df$Trial, 3)) gets the numeric part of df$Trial and converts it to a numeric vector. Subtracting 1 and using %/% then returns the group number for each element of df$Trial, starting from 0. Given a group number, we can easily construct a new column with sprintf.
size is the size of groups. E.g., setting size <- 5 would give values ES1-5, ES6-10, and so on.

Here's a solution that uses parse_number from readr.
df %>%
mutate(grp = cut(parse_number(Trial),
breaks = seq(1, 38, by = 3),
right = FALSE)) %>%
group_by(grp)
This pulls out the number from Trial then cuts to create a grouping variable, which it then groups by. right=FALSE indicates that the interval is closed on the left.
An edit based on a comment below.
df %>%
mutate(grp = cut(parse_number(Trial),
breaks = c(seq(1, 34, by = 3) 38),
right = FALSE),
include.lowest = TRUE) %>%
group_by(grp)

Related

Long to wider format

I have lab records of 30,000 unique ID's. I need to convert my data from long to wider format for each ID and TEST_DATE related to that unique ID.
Example for one ID :
I need to convert this to a wider format like this:
I have a dataset with 30,000 ID's and I need to do this for each ID. The ID with the maximum number of tests will determine our number of columns.
I will appreciate any ideas that you might have to solve this problem! Thank you

Try this:
library(dplyr)
library(tidyr)
#Code
new <- df %>%
group_by(ACCT,TEST_DATE) %>%
summarise(RESULT=round(mean(RESULT,na.rm=T),2)) %>%
ungroup() %>%
mutate(across(-ACCT,~as.character(.))) %>%
pivot_longer(-ACCT) %>%
group_by(ACCT,name) %>%
mutate(name=paste0(name,row_number())) %>%
pivot_wider(names_from = name,values_from=value) %>%
mutate(across(starts_with('RESULT'),~as.numeric(.)))
Output:
# A tibble: 2 x 7
# Groups: ACCT [2]
ACCT TEST_DATE1 RESULT1 TEST_DATE2 RESULT2 TEST_DATE3 RESULT3
<int> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 37733 9/1/2016 3 10/18/2016 2 11/1/2016 1
2 37734 9/1/2016 5 10/18/2016 4 11/1/2016 3
Some data used:
#Data
df <- structure(list(ACCT = c(37733L, 37733L, 37733L, 37734L, 37734L,
37734L), TEST_DATE = c("9/1/2016", "10/18/2016", "11/1/2016",
"9/1/2016", "10/18/2016", "11/1/2016"), RESULT = c(3L, 2L, 1L,
5L, 4L, 3L)), class = "data.frame", row.names = c(NA, -6L))

Here is a data.table option with dcast that might help (borrow data from #Duck)
> dcast(setDT(df)[, Q := seq(.N), ACCT], ACCT ~ Q, value.var = c("TEST_DATE", "RESULT"))
ACCT TEST_DATE_1 TEST_DATE_2 TEST_DATE_3 RESULT_1 RESULT_2 RESULT_3
1: 37733 9/1/2016 10/18/2016 11/1/2016 3 2 1
2: 37734 9/1/2016 10/18/2016 11/1/2016 5 4 3
Another option is using melt along with dcast, where the resulting format might be the one you are exactly after
suppressWarnings({
type.convert(
dcast(
melt(
setDT(df)[, Q := seq(.N), ACCT],
id = c("ACCT", "Q"),
measure = c("TEST_DATE", "RESULT")
)[order(ACCT, Q)],
ACCT ~ Q + variable,
value.var = "value"
),
as.is = TRUE
)
})
which gives
ACCT 1_TEST_DATE 1_RESULT 2_TEST_DATE 2_RESULT 3_TEST_DATE 3_RESULT
1: 37733 9/1/2016 3 10/18/2016 2 11/1/2016 1
2: 37734 9/1/2016 5 10/18/2016 4 11/1/2016 3

Take this simple route
library(tidyverse)
df %>% group_by(ACCT, TEST_DATE) %>% summarise(RESULT = mean(RESULT)) %>%
group_by(ACCT) %>% mutate(testno = row_number(), resultno = row_number()) %>%
pivot_wider(id_cols = ACCT, names_from = c("testno", "resultno"), values_from = c(TEST_DATE, RESULT))
# A tibble: 2 x 9
# Groups: ACCT [2]
ACCT TEST_DATE_1_1 TEST_DATE_2_2 TEST_DATE_3_3 TEST_DATE_4_4 RESULT_1_1 RESULT_2_2 RESULT_3_3 RESULT_4_4
<int> <date> <date> <date> <date> <dbl> <dbl> <dbl> <dbl>
1 37733 2016-01-07 2016-01-09 2016-01-11 2016-08-10 5 4.5 1 2
2 37734 2016-01-21 2016-08-20 NA NA 3 4 NA NA
data (dput) used
> dput(df)
structure(list(ACCT = c(37733L, 37733L, 37733L, 37733L, 37734L,
37734L, 37733L), TEST_DATE = structure(c(16809, 17023, 16811,
16807, 17033, 16821, 16809), class = "Date"), RESULT = c(3L,
2L, 1L, 5L, 4L, 3L, 6L)), row.names = c(NA, -7L), class = "data.frame")
df
> df
ACCT TEST_DATE RESULT
1 37733 2016-01-09 3
2 37733 2016-08-10 2
3 37733 2016-01-11 1
4 37733 2016-01-07 5
5 37734 2016-08-20 4
6 37734 2016-01-21 3
7 37733 2016-01-09 6

Transform to wide format from long in R

I have a data frame in R which looks like below
Model Month Demand Inventory
A Jan 10 20
B Feb 30 40
A Feb 40 60
I want the data frame to look
Jan Feb
A_Demand 10 40
A_Inventory 20 60
A_coverage
B_Demand 30
B_Inventory 40
B_coverage
A_coverage and B_Coverage will be calculated in excel using a formula. But the problem I need help with is to pivot the data frame from wide to long format (original format).
I tried to implement the solution from the linked duplicate but I am still having difficulty:
HD_dcast <- reshape(data,idvar = c("Model","Inventory","Demand"),
timevar = "Month", direction = "wide")
Here is a dput of my data:
data <- structure(list(Model = c("A", "B", "A"), Month = c("Jan", "Feb",
"Feb"), Demand = c(10L, 30L, 40L), Inventory = c(20L, 40L, 60L
)), class = "data.frame", row.names = c(NA, -3L))
Thanks

Here's an approach with dplyr and tidyr, two popular R packages for data manipulation:
library(dplyr)
library(tidyr)
data %>%
mutate(coverage = NA_real_) %>%
pivot_longer(-c(Model,Month), names_to = "Variable") %>%
pivot_wider(id_cols = c(Model, Variable), names_from = Month ) %>%
unite(Variable, c(Model,Variable), sep = "_")
## A tibble: 6 x 3
# Variable Jan Feb
# <chr> <dbl> <dbl>
#1 A_Demand 10 40
#2 A_Inventory 20 60
#3 A_coverage NA NA
#4 B_Demand NA 30
#5 B_Inventory NA 40
#6 B_coverage NA NA

Calculating age over multiple dataframes based on name of dataframe

I was wondering if someone here can help me with a lapply question.
Every month, data are extracted and the data frames are named according to the date extracted (01-08-2019,01-09-2019,01-10-2019 etc). The contents of each data frame are similar to the example below:
01-09-2019
ID DOB
3 01-07-2019
5 01-06-2019
7 01-05-2019
8 01-09-2019
01-10-2019
ID DOB
2 01-10-2019
5 01-06-2019
8 01-09-2019
9 01-02-2019
As the months roll on, there are more data sets being downloaded.
I am wanting to calculate the ages of people in each of the data sets based on the date the data was extracted - so in essence, the age would be the date difference between the data frame name and the DOB variable.
01-09-2019
ID DOB AGE(months)
3 01-07-2019 2
5 01-06-2019 3
7 01-05-2019 4
8 01-09-2019 0
01-10-2019
ID DOB AGE(months)
2 01-10-2019 0
5 01-06-2019 4
8 01-09-2019 1
9 01-02-2019 8
I was thinking of putting all of the data frames together in a list (as there are a lot) and then using lapply to calculate age across all data frames. How do I go about calculating the difference between a data frame name and a column?

If I may suggest a slightly differen approach: It might make more sense to compress your list into a single data frame before calculating the ages. Given your data looks something like this, i.e. it is a list of data frames, where the list element names are the dates of access:
$`01-09-2019`
# A tibble: 4 x 2
ID DOB
<dbl> <date>
1 3 2019-07-01
2 5 2019-06-01
3 7 2019-05-01
4 8 2019-09-01
$`01-10-2019`
# A tibble: 4 x 2
ID DOB
<dbl> <date>
1 2 2019-10-01
2 5 2019-06-01
3 8 2019-09-01
4 9 2019-02-01
You can call bind_rows first with parameter .id = "date_extracted" to turn your list into a data frame, and then calculate age in months.
library(tidyverse)
library(lubridate)
tib <- bind_rows(tib_list, .id = "date_extracted") %>%
mutate(date_extracted = dmy(date_extracted),
DOB = dmy(DOB),
age_months = month(date_extracted) - month(DOB)
)
#### OUTPUT ####
# A tibble: 8 x 4
date_extracted ID DOB age_months
<date> <dbl> <date> <dbl>
1 2019-09-01 3 2019-07-01 2
2 2019-09-01 5 2019-06-01 3
3 2019-09-01 7 2019-05-01 4
4 2019-09-01 8 2019-09-01 0
5 2019-10-01 2 2019-10-01 0
6 2019-10-01 5 2019-06-01 4
7 2019-10-01 8 2019-09-01 1
8 2019-10-01 9 2019-02-01 8

This can be solved with lapply as well but we can also use Map in this case to iterate over list and their names after adding all the dataframes in a list. In base R,
Map(function(x, y) {
x$DOB <- as.Date(x$DOB)
transform(x, age = as.integer(format(as.Date(y), "%m")) -
as.integer(format(x$DOB, "%m")))
}, list_df, names(list_df))
#$`01-09-2019`
# ID DOB age
#1 3 0001-07-20 2
#2 5 0001-06-20 3
#3 7 0001-05-20 4
#4 8 0001-09-20 0
#$`01-10-2019`
# ID DOB age
#1 2 0001-10-20 0
#2 5 0001-06-20 4
#3 8 0001-09-20 1
#4 9 0001-02-20 8
We can also do the same in tidyverse
library(dplyr)
library(lubridate)
purrr::imap(list_df, ~.x %>% mutate(age = month(.y) - month(DOB)))
data
list_df <- list(`01-09-2019` = structure(list(ID = c(3L, 5L, 7L, 8L),
DOB = structure(c(3L, 2L, 1L, 4L), .Label = c("01-05-2019", "01-06-2019",
"01-07-2019", "01-09-2019"), class = "factor")), class = "data.frame",
row.names = c(NA, -4L)), `01-10-2019` = structure(list(ID = c(2L, 5L, 8L, 9L),
DOB = structure(c(4L, 2L, 3L, 1L), .Label = c("01-02-2019",
"01-06-2019", "01-09-2019", "01-10-2019"), class = "factor")),
class = "data.frame", row.names = c(NA, -4L)))

It's bad practice to use dates and numbers as dataframe names consider prefix the date with an "x" as shown below in this base R solution:
df_list <- list(x01_09_2019 = `01-09-2019`, x01_10_2019 = `01-10-2019`)
df_list <- mapply(cbind, "report_date" = names(df_list), df_list, SIMPLIFY = F)
df_list <- lapply(df_list, function(x){
x$report_date <- as.Date(gsub("_", "-", gsub("x", "", x$report_date)), "%d-%m-%Y")
x$Age <- x$report_date - x$DOB
return(x)
}
)
Data:
`01-09-2019` <- structure(list(ID = c(3, 5, 7, 8),
DOB = structure(c(18078, 18048, 18017, 18140), class = "Date")),
class = "data.frame", row.names = c(NA, -4L))
`01-10-2019` <- structure(list(ID = c(2, 5, 8, 9),
DOB = structure(c(18170, 18048, 18140, 17928), class = "Date")),
class = "data.frame", row.names = c(NA, -4L))

How to make a frequency table from a data frame in R

The data frame is like this:
enter image description here
header: system
Row 1: 00000000000000000503_0
Row 2: 00000000000000000503_1
Row 3: 00000000000000000503_2
Row 4: 00000000000000000503_3
Row 5: 000000000000000004e7_0
Row 6: 000000000000000004e7_1
Row 7: 00000000000000000681_0
Row 8: 00000000000000000681_1
Row 9: 00000000000000000681_2
I want to generate a frequency table with the quantities of the code before string "_" such that:
"00000000000000000503" appears 4 times, "000000000000000004e7" appears 2 times, and so on.
How do I do this in R?

Remove everything after underscore and use table to count frequency
table(sub("_.*", "", data$col1))
#Also
#table(sub("(.*)_.*", "\\1", data$col1))
#000000000000000004e7 00000000000000000503 00000000000000000681
# 2 4 3
If final output needs to be a dataframe use stack
stack(table(sub("_.*", "", data$col1)))
# values ind
#1 2 000000000000000004e7
#2 4 00000000000000000503
#3 3 00000000000000000681
data
data <- structure(list(col1 = structure(c(3L, 4L, 5L, 6L, 1L, 2L, 7L,
8L, 9L), .Label = c("000000000000000004e7_0", "000000000000000004e7_1",
"00000000000000000503_0", "00000000000000000503_1",
"00000000000000000503_2",
"00000000000000000503_3", "00000000000000000681_0",
"00000000000000000681_1",
"00000000000000000681_2"), class = "factor")), class = "data.frame",
row.names = c(NA, -9L))

A dplyr-tidyr alternative:
df %>%
tidyr::separate(V3, c("target", "non_target")) %>%
count(target)
# A tibble: 3 x 2
target n
<chr> <int>
1 000000000000000004e7 2
2 00000000000000000503 4
3 00000000000000000681 3
With base:
table(sapply(strsplit(df$system, "_"),"[[", 1))
Data:
df <- structure(list(V1 = c("Row", "Row", "Row", "Row", "Row", "Row",
"Row", "Row", "Row"), V2 = c("1:", "2:", "3:", "4:", "5:", "6:",
"7:", "8:", "9:"), V3 = c("00000000000000000503_0", "00000000000000000503_1",
"00000000000000000503_2", "00000000000000000503_3", "000000000000000004e7_0",
"000000000000000004e7_1", "00000000000000000681_0", "00000000000000000681_1",
"00000000000000000681_2")), class = "data.frame", row.names = c(NA,
-9L))

Another option using the stringr library that is included in tidyverse
> library(tidyverse)
> mydata <- data.frame(system = c("00000000000000000503_0",
"00000000000000000503_1",
"00000000000000000503_2",
"00000000000000000503_3",
"000000000000000004e7_0",
"000000000000000004e7_1",
"00000000000000000681_0",
"00000000000000000681_1",
"00000000000000000681_2"))
> mydata
system
1 00000000000000000503_0
2 00000000000000000503_1
3 00000000000000000503_2
4 00000000000000000503_3
5 000000000000000004e7_0
6 000000000000000004e7_1
7 00000000000000000681_0
8 00000000000000000681_1
9 00000000000000000681_2
> # Split data using str_split
> mydata$leftside <- sapply(mydata$system, function(x) unlist(str_split(x, "_"))[1]) #split string by the "_" and take first piece
> mydata$rightside <- sapply(mydata$system, function(x) unlist(str_split(x, "_"))[2]) #split string by the "_" and take second piece
>
> mydata
system leftside rightside
1 00000000000000000503_0 00000000000000000503 0
2 00000000000000000503_1 00000000000000000503 1
3 00000000000000000503_2 00000000000000000503 2
4 00000000000000000503_3 00000000000000000503 3
5 000000000000000004e7_0 000000000000000004e7 0
6 000000000000000004e7_1 000000000000000004e7 1
7 00000000000000000681_0 00000000000000000681 0
8 00000000000000000681_1 00000000000000000681 1
9 00000000000000000681_2 00000000000000000681 2
> # alternative tabulate fuction than base::table(). Can Provide nicer options.
> xtabs(data = mydata, formula = ~leftside)
leftside
000000000000000004e7 00000000000000000503 00000000000000000681
2 4 3

A tidyverse answer would be
my_data <- mydata %>%
mutate_if(is.factor, as.character) %>%
mutate(system = gsub('_[^_]*$', '', system)) %>%
group_by(system) %>%
count() %>%
ungroup()
my_data

An option with str_remove and group_by
library(stringr)
library(dplyr)
df %>%
group_by(V3 = str_remove(V3, "_\\d+$")) %>%
summarise(n = n())
# A tibble: 3 x 2
# V3 n
# <chr> <int>
#1 000000000000000004e7 2
#2 00000000000000000503 4
#3 00000000000000000681 3
Or in base R with table and trimws
table(trimws(df$V3, whitespace = "_[0-9]+"))
data
df <- structure(list(V1 = c("Row", "Row", "Row", "Row", "Row", "Row",
"Row", "Row", "Row"), V2 = c("1:", "2:", "3:", "4:", "5:", "6:",
"7:", "8:", "9:"), V3 = c("00000000000000000503_0", "00000000000000000503_1",
"00000000000000000503_2", "00000000000000000503_3", "000000000000000004e7_0",
"000000000000000004e7_1", "00000000000000000681_0", "00000000000000000681_1",
"00000000000000000681_2")), class = "data.frame", row.names = c(NA,
-9L))

Is it possible to combine summarise with summarise_at in a single group_by with dplyr

Edit: just realized the side column in the data isn't used at all, so please disregard it for the purposes of the example.
I have a large dataframe of play-by-play basketball data, and I would like to perform a group_by, summarise and summarise_at on my data. Below is a subset of my dataframe:
> dput(zed)
structure(list(side = c("right", "right", "right", "right", "right",
"right", "left", "right", "right", "right", "left", "right",
"left", "left", "left", "right", "right", "right", "left", "right"
), result = c("twopointmiss", "twopointmade", "twopointmade",
"twopointmiss", "twopointmade", "twopointmade", "twopointmiss",
"twopointmade", "twopointmade", "twopointmade", "twopointmade",
"twopointmade", "twopointmiss", "twopointmiss", "twopointmiss",
"twopointmiss", "twopointmade", "twopointmade", "twopointmiss",
"twopointmiss"), zonenumber = c(1, 1, 1, 1, 2, 3, 2, 3, 2, 3,
4, 4, 4, 1, 1, 2, 3, 2, 3, 4), team = c("Bos", "Bos", "Bos",
"Bos", "Bos", "Bos", "Bos", "Bos", "Bos", "Bos", "Min", "Min",
"Min", "Min", "Min", "Min", "Min", "Min", "Min", "Min")), row.names = c(3L,
5L, 8L, 14L, 17L, 23L, 28L, 30L, 39L, 41L, 42L, 43L, 47L, 52L,
54L, 58L, 60L, 63L, 69L, 72L), class = "data.frame")
> zed
side result zonenumber team
3 right twopointmiss 1 Bos
5 right twopointmade 1 Bos
8 right twopointmade 1 Bos
14 right twopointmiss 1 Bos
17 right twopointmade 2 Bos
23 right twopointmade 3 Bos
28 left twopointmiss 2 Bos
30 right twopointmade 3 Bos
39 right twopointmade 2 Bos
41 right twopointmade 3 Bos
42 left twopointmade 4 Min
43 right twopointmade 4 Min
47 left twopointmiss 4 Min
52 left twopointmiss 1 Min
54 left twopointmiss 1 Min
58 right twopointmiss 2 Min
60 right twopointmade 3 Min
63 right twopointmade 2 Min
69 left twopointmiss 3 Min
72 right twopointmiss 4 Min
In the example below, i only use summarise, as I'm currently not sure how to use summarise and summarise_at with the same group_by call:
> grouped.df <- zed %>%
+ dplyr::group_by(team) %>%
+ dplyr::summarise(
+ shotsMade = sum(result == "twopointmade"),
+ shotsAtt = n(),
+ shotsPct = round(shotsMade / shotsAtt),
+ points = 2 * shotsMade,
+
+ z1Made = sum(zonenumber == 1),
+ z2Made = sum(zonenumber == 2),
+ z3Made = sum(zonenumber == 3),
+ z4Made = sum(zonenumber == 4)
+ )
> grouped.df
# A tibble: 2 x 9
team shotsMade shotsAtt shotsPct points z1Made z2Made z3Made z4Made
<chr> <int> <int> <dbl> <dbl> <int> <int> <int> <int>
1 Bos 7 10 1 14 4 3 3 0
2 Min 4 10 0 8 2 2 2 4
In the example below, I'd like to create the first 4 columns (shotsMade, shotsAtt, shotsPct, points) in summarise, and create the z#made columns with a summarise_at. In my full data, there are ~30 unique-ish columns that I plan on creating with summarise, and ~80 similar-ish columns that I plan on creating with summarise_at.
For sake of a small example, I didn't want to bring my entire dataframe in for this example. If I am able to implement both summarise and summarise_at in the example above, then I'll be able to do it for my full data frame as well.
Any thoughts on this is greatly appreciated, as I am particularly keen on improving with the _at functions in dplyr. Thanks!

I don't think there is a way to actually use both summarise and summarise_at as clearly we wouldn't be able to execute the second one after losing many rows and columns.
So, instead we may use mutate, mutate_at, and then drop certain rows (and perhaps columns).The difference between this and somehow magically applying summarise and summarise_at is going to be that the former approach will not drop any variables. I guess it depends whether that's a good thing for you. Below I add an extra line of select(-one_of(setdiff(names(zed), "team"))) that will actually drop all the columns that the summarise combo would drop.
zed$zonenumber2 <- zed$zonenumber # Example
zed %>%
group_by(team) %>%
mutate(
shotsMade = sum(result == "twopointmade"),
shotsAtt = n(),
shotsPct = round(shotsMade / shotsAtt),
points = 2 * shotsMade) %>%
mutate_at(
vars(contains("zone")),
.funs = funs(Made1 = sum(. == 1), Made2 = sum(. == 2),
Made3 = sum(. == 3), Made4 = sum(. == 4))) %>%
filter(!duplicated(team)) %>%
select(-one_of(setdiff(names(zed), "team"))) # May want to remove
# A tibble: 2 x 13
# Groups: team [2]
# team shotsMade shotsAtt shotsPct points zonenumber_Made1 zonenumber2_Mad… zonenumber_Made2
# <chr> <int> <int> <dbl> <dbl> <int> <int> <int>
# 1 Bos 7 10 1 14 4 4 3
# 2 Min 4 10 0 8 2 2 2
# … with 5 more variables: zonenumber2_Made2 <int>, zonenumber_Made3 <int>,
# zonenumber2_Made3 <int>, zonenumber_Made4 <int>, zonenumber2_Made4 <int>

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

grouping a character vector into new groups using dplyr - r

Related

Long to wider format

Transform to wide format from long in R

Calculating age over multiple dataframes based on name of dataframe

How to make a frequency table from a data frame in R

Is it possible to combine summarise with summarise_at in a single group_by with dplyr

Categories

Resources