Creating a Shiny App for Fantasy Football Draft Optimization - r
I've written some R code to produce the optimal fantasy football lineup (maximize projected points scored) constrained on user inputted roster sizes and draft budget based on a data frame called "players" that consists of player, position, fantasy points, and draft value.
The idea is to use this tool prior to drafting (to have the ideal lineup in mind) and then to update it live while drafting since this never goes to plan.
From there, I want to be able to remove players from the players dataset (when they're drafted by others) and add players to my lineup as I draft them (so they appear in every future optimal lineup). I have added the functionality and th remove player button seems to work fairly well (for some reason the lineup vanishes when you input a new player each time, but then re-appears properly once a new player is removed) but it definitely doesn't seem to draft a player to the team properly. I'm definitely thinking it has something to do with the last snippet of code before running the app, but I'm having trouble thinking through the logic there.
The dataframe is:
players <- structure(list(Player = c("Josh Allen", "Patrick Mahomes", "Justin Herbert",
"Lamar Jackson", "Kyler Murray", "Jalen Hurts", "Tom Brady",
"Dak Prescott", "Joe Burrow", "Russell Wilson", "Aaron Rodgers",
"Trey Lance", "Matthew Stafford", "Kirk Cousins", "Derek Carr",
"Tua Tagovailoa", "Justin Fields", "Trevor Lawrence", "Ryan Tannehill",
"Daniel Jones", "Matt Ryan", "Jameis Winston", "Carson Wentz",
"Mac Jones", "Jared Goff", "Zach Wilson", "Davis Mills", "Baker Mayfield",
"Marcus Mariota", "Deshaun Watson", "Mitchell Trubisky", "Geno Smith",
"Drew Lock", "Kenny Pickett", "Jacoby Brissett", "Desmond Ridder",
"Travis Kelce", "Mark Andrews", "Kyle Pitts", "Darren Waller",
"George Kittle", "Dalton Schultz", "T.J. Hockenson", "Dallas Goedert",
"Zach Ertz", "Dawson Knox", "Hunter Henry", "Mike Gesicki", "Pat Freiermuth",
"Cole Kmet", "Irv Smith Jr.", "Noah Fant", "Tyler Higbee", "David Njoku",
"Albert Okwuegbunam", "Gerald Everett", "Robert Tonyan", "Jonathan Taylor",
"Christian McCaffrey", "Derrick Henry", "Austin Ekeler", "Dalvin Cook",
"Joe Mixon", "Najee Harris", "Alvin Kamara", "D'Andre Swift",
"Leonard Fournette", "Saquon Barkley", "Aaron Jones", "Nick Chubb",
"James Conner", "Javonte Williams", "Ezekiel Elliott", "David Montgomery",
"Cam Akers", "Travis Etienne Jr.", "Breece Hall", "J.K. Dobbins",
"Josh Jacobs", "Antonio Gibson", "Elijah Mitchell", "AJ Dillon",
"Cordarrelle Patterson", "Damien Harris", "Miles Sanders", "Clyde Edwards-Helaire",
"Tony Pollard", "Devin Singletary", "Kareem Hunt", "Chase Edmonds",
"Rashaad Penny", "Rhamondre Stevenson", "Kenneth Walker III",
"Melvin Gordon III", "Darrell Henderson Jr.", "James Robinson",
"James Cook", "Dameon Pierce", "Michael Carter", "Jamaal Williams",
"Nyheim Hines", "J.D. McKissic", "Kenneth Gainwell", "Alexander Mattison",
"Isaiah Spiller", "Raheem Mostert", "Mark Ingram II", "Marlon Mack",
"Brian Robinson", "Gus Edwards", "Rex Burkhead", "Rachaad White",
"Khalil Herbert", "Damien Williams", "Tyler Allgeier", "D'Onta Foreman",
"Jerick McKinnon", "Cooper Kupp", "Justin Jefferson", "Ja'Marr Chase",
"Davante Adams", "Stefon Diggs", "Deebo Samuel", "CeeDee Lamb",
"Mike Evans", "Tyreek Hill", "Tee Higgins", "Keenan Allen", "DJ Moore",
"A.J. Brown", "Michael Pittman Jr.", "Mike Williams", "Brandin Cooks",
"Jaylen Waddle", "Diontae Johnson", "Terry McLaurin", "DK Metcalf",
"Courtland Sutton", "Amon-Ra St. Brown", "Darnell Mooney", "Allen Robinson II",
"Marquise Brown", "Amari Cooper", "Gabriel Davis", "Chris Godwin",
"Michael Thomas", "Jerry Jeudy", "Adam Thielen", "JuJu Smith-Schuster",
"Hunter Renfrow", "Rashod Bateman", "Elijah Moore", "Tyler Lockett",
"Christian Kirk", "Robert Woods", "DeVonta Smith", "Drake London",
"Allen Lazard", "Brandon Aiyuk", "Chase Claypool", "Kadarius Toney",
"Tyler Boyd", "Garrett Wilson", "DeVante Parker", "Chris Olave",
"Kenny Golladay", "Jakobi Meyers", "Russell Gage", "Marquez Valdes-Scantling",
"DeAndre Hopkins", "Marvin Jones Jr.", "Treylon Burks", "Michael Gallup",
"Robbie Anderson", "DJ Chark", "Jahan Dotson", "Mecole Hardman"
), Position = c("QB", "QB", "QB", "QB", "QB", "QB", "QB", "QB",
"QB", "QB", "QB", "QB", "QB", "QB", "QB", "QB", "QB", "QB", "QB",
"QB", "QB", "QB", "QB", "QB", "QB", "QB", "QB", "QB", "QB", "QB",
"QB", "QB", "QB", "QB", "QB", "QB", "TE", "TE", "TE", "TE", "TE",
"TE", "TE", "TE", "TE", "TE", "TE", "TE", "TE", "TE", "TE", "TE",
"TE", "TE", "TE", "TE", "TE", "RB", "RB", "RB", "RB", "RB", "RB",
"RB", "RB", "RB", "RB", "RB", "RB", "RB", "RB", "RB", "RB", "RB",
"RB", "RB", "RB", "RB", "RB", "RB", "RB", "RB", "RB", "RB", "RB",
"RB", "RB", "RB", "RB", "RB", "RB", "RB", "RB", "RB", "RB", "RB",
"RB", "RB", "RB", "RB", "RB", "RB", "RB", "RB", "RB", "RB", "RB",
"RB", "RB", "RB", "RB", "RB", "RB", "RB", "RB", "RB", "RB", "WR",
"WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR",
"WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR",
"WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR",
"WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR",
"WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR",
"WR", "WR", "WR", "WR"), FantasyPoints = c(445, 410, 407, 348,
351, 359, 354, 364, 402, 368, 353, 347, 349, 335, 366, 325, 297,
313, 273, 283, 302, 284, 275, 296, 291, 0, 247, 286, 276, 0,
0, 0, 0, 269, 0, 0, 252, 231, 206, 171, 185, 177, 174, 169, 169,
171, 139, 131, 170, 170, 162, 129, 162, 119, 130, 126, 130, 340,
285, 260, 278, 277, 271, 277, 247, 271, 225, 247, 249, 230, 196,
268, 205, 199, 213, 231, 220, 177, 176, 159, 178, 185, 155, 181,
157, 190, 177, 164, 156, 166, 169, 179, 158, 129, 147, 99, 158,
176, 150, 100, 157, 128, 156, 124, 98, 95, 75, 90, 136, 80, 82,
143, 128, 0, 147, 97, 63, 326, 337, 308, 299, 269, 267, 271,
242, 243, 241, 239, 243, 242, 244, 209, 220, 233, 239, 221, 198,
221, 209, 220, 209, 218, 178, 224, 183, 186, 203, 188, 164, 207,
211, 202, 173, 188, 163, 199, 171, 181, 182, 140, 170, 175, 144,
142, 164, 147, 131, 170, 160, 182, 136, 153, 157, 152, 148, 175,
144), DraftValue = c(31, 23, 20, 15, 16, 14, 16, 11, 12, 10,
10, 3, 7, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 37, 34, 22, 20, 17, 16, 12, 11, 9, 6, 4, 4,
5, 5, 2, 2, 2, 1, 1, 1, 1, 56, 55, 44, 48, 38, 38, 40, 38, 36,
34, 34, 33, 27, 30, 28, 27, 23, 21, 23, 21, 19, 18, 10, 15, 16,
16, 12, 12, 14, 13, 10, 11, 12, 8, 9, 1, 6, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 56, 48, 41,
40, 37, 31, 34, 29, 30, 28, 28, 26, 24, 26, 23, 23, 22, 21, 20,
18, 19, 20, 17, 18, 17, 15, 15, 17, 17, 16, 16, 15, 15, 13, 12,
12, 12, 11, 9, 9, 9, 7, 5, 6, 4, 2, 2, 2, 1, 3, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-177L))
and the code is below:
library(shiny)
library(lpSolve)
library(rsconnect)
# Define the UI for the app
ui <- fluidPage(
titlePanel("Fantasy Football Lineup Optimizer"),
sidebarLayout(
sidebarPanel(
numericInput("num_qb", "Enter the number of QBs:", 1, min = 1, max = 5),
numericInput("num_rb", "Enter the number of RBs:", 2, min = 1, max = 5),
numericInput("num_wr", "Enter the number of WRs:", 3, min = 1, max = 5),
numericInput("num_te", "Enter the number of TEs:", 1, min = 1, max = 5),
numericInput("num_value", "Enter your draft budget:", 200),
numericInput("num_players", "Adding in your flex spots, enter the total number of starters:", 9, min = 1, max = 15),
selectInput("remove", "Remove a player:", choices = c("",as.character(players$Player)), multiple = TRUE),
actionButton("update", "Update Team"),
selectInput("draft", "Draft Player", choices = c("",as.character(players$Player)), multiple = TRUE),
actionButton("draft_button", "Draft")
),
mainPanel(
tableOutput("team")
)
)
)
# Define the server logic
server <- function(input, output) {
players <- players
# Create a new column indicating the player's position
players$QB <- ifelse(players$Position == "QB", 1, 0)
players$RB <- ifelse(players$Position == "RB", 1, 0)
players$WR <- ifelse(players$Position == "WR", 1, 0)
players$TE <- ifelse(players$Position == "TE", 1, 0)
players$Total <- 1
rv <- reactiveValues(players=players)
# Define the objective function (maximize fantasy points)
obj <- players$FantasyPoints
# Define the constraints (position limits and draft value limit)
con <- reactive({
matrix(c(
# QB constraint
rv$players$QB,
# RB constraint
rv$players$RB,
# WR constraint
rv$players$WR,
# TE constraint
rv$players$TE,
# Draft value constraint
rv$players$DraftValue,
#Total players constraint
rv$players$Total
), ncol = nrow(rv$players), byrow = TRUE)
})
# Define the variables for the lp
dir <- c("<=", rep(">=",3),"<=","<=")
# Define the initial optimal lineup
initialLineup <- reactive({
rhs <- reactive({
c(input$num_qb, input$num_rb, input$num_wr, input$num_te, input$num_value, input$num_players)
})
result <- lp("max", obj, con(), dir, rhs(), all.bin = TRUE)
rv$players[result$solution == 1,]
})
# Show the updated optimal team in a table for any constraint change
output$team <- renderTable({
lineupResult()[, c("Player", "Position", "DraftValue", "FantasyPoints")]
})
# Define the function to run when the "update" button is pressed
updateLineup <- eventReactive(input$update, {
removedPlayer <- input$remove
rv$players <- rv$players[rv$players$Player != removedPlayer,]
obj <- rv$players$FantasyPoints
rhs <- reactive({
c(input$num_qb, input$num_rb, input$num_wr, input$num_te, input$num_value, input$num_players)
})
result <- lp("max", obj, con(), dir, rhs(), all.bin = TRUE)
rv$players[result$solution == 1,]
})
# Define the function to run when the "draft player" button is pressed
draftPlayer <- eventReactive(input$draft, {
draftedPlayer <- input$draft_player
draftedPlayers <- rv$players[rv$players$Player == draftedPlayer,]
rv$players <- rv$players[rv$players$Player != draftedPlayer,]
rv$draftedPlayers <- rbind(rv$draftedPlayers, draftedPlayers)
rhs <- reactive({
c(input$num_qb, input$num_rb, input$num_wr, input$num_te, input$num_value, input$num_players)
})
result <- lp("max", obj, con(), dir, rhs(), all.bin = TRUE)
rv$players[result$solution == 1,]
rv$players <- rbind(rv$players, rv$draftedPlayers)
})
# Show the updated optimal team in a table when the "update" button is pressed
output$team <- renderTable({
if (is.null(input$draft_player)) {
if (is.null(input$remove)) {
initialLineup()[, c("Player", "Position", "FantasyPoints", "DraftValue")]
} else {
updateLineup()[, c("Player", "Position", "FantasyPoints", "DraftValue")]
}
} else {
draftPlayer()[, c("Player", "Position", "FantasyPoints", "DraftValue")]
}
})
}
# Run the app
shinyApp(ui, server)
I gave this one a shot. I modified your code quite a bit to reduce some of the complexity to it but first I'll point out a few issues I saw before I rewrote portions of it.
The reason why your lineup vanishes and then re-appears at times is due to how you link up your renderTable to your inputs and eventReactives.
output$team has dependencies on input$draft_player and input$remove. First off I don't see input$draft_player in your UI so I assume this is meant to be input$draft_button. With that said, when removing a player, the renderTable function invalidates first when you enter a player into the select input. But updateLineup() is dependent on input$update so it doesn't return anything until you click "Update Team". Thus causing the delay.
In your draftPlayer expression you haven't set obj to be the new value of rv$players$FantasyPoints so instead lp() takes the value of obj in the parent environment which has the full set of players and thus err's out.
One additional thing I noticed is in both functions you return rv$players[result$solution == 1,]. Personally, I think the problem with this is you'll always output the most optimized full lineup regardless of your own draft selections. Intuitively, I would think you'd want to return the best lineup not including the positions you've already drafted. So if a QB and 2 WR's have been drafted. Then you'd return a lineup with only 6 players since 3 have already been drafted.
Below, I've written some code that takes that last piece into account as well as reduce the number of output functions. This is just what made sense to me and maybe I'm off base here, but hopefully it's along the same lines of what you're trying to accomplish!
Let me know if you have any questions.
The goal with the code below, is to include your draft picks alongside the optimized selection. When a player is drafted, we need to do a few things.
remove the player from the pool
update const.rhs with new constraints
subtract the position requirement by 1
subtract the overall number of players by 1
subtract the draft budget by the players draftValue
When a player is removed they are simply 'removed' from the player pool and lp will run without that player.
With the new constraints in place, lp will return a new lineup with the number of available players and positions that have yet to be drafted using a modified draft budget. After a player has been drafted or removed, they will be removed from the select inputs so you can't accidentally choose them in the future.
Restrictions
In my version I have a few constraints to be aware of. For simplicity sake, you can only remove or draft one player at a time. Due to how const.rhs is calculated, the sum of the position inputs MUST match the total number of starters. I know that could be an issue as after the 5 or 6th round, I'll debate on whether to pick up an RB or WR depending on who's available. Also I would advise not changing any input above remove player once the draft has started since that will likely screw things up.
App Code
library(shiny)
library(lpSolve)
library(purrr)
# Define the UI for the app
ui <- fluidPage(
titlePanel("Fantasy Football Lineup Optimizer"),
sidebarLayout(
sidebarPanel(
numericInput("num_qb", "Enter the number of QBs:", 1, min = 1, max = 5),
numericInput("num_rb", "Enter the number of RBs:", 3, min = 1, max = 5),
numericInput("num_wr", "Enter the number of WRs:", 3, min = 1, max = 5),
numericInput("num_te", "Enter the number of TEs:", 2, min = 1, max = 5),
numericInput("num_value", "Enter your draft budget:", 200),
numericInput("num_players", "Adding in your flex spots, enter the total number of starters:", 9, min = 1, max = 15),
selectInput("remove", "Remove a player:", choices = c("",as.character(players$Player)), multiple = FALSE),
selectInput("draft_player", "Draft Player", choices = c("",as.character(players$Player)), multiple = FALSE),
actionButton("update", "Update Lineup")
),
mainPanel(
tableOutput("team")
)
)
)
# Define the server logic
server <- function(input, output, session) {
players <- players
# New col to indicate if a player has been drafted
players$Drafted = "No"
# Create a new column indicating the player's position
players$QB <- ifelse(players$Position == "QB", 1, 0)
players$RB <- ifelse(players$Position == "RB", 1, 0)
players$WR <- ifelse(players$Position == "WR", 1, 0)
players$TE <- ifelse(players$Position == "TE", 1, 0)
players$Total <- 1
rv <- reactiveValues(players=players)
# Set up reactive table for lineup output
updateLineup = reactiveVal(NULL)
# Define the objective function (maximize fantasy points)
obj <- players$FantasyPoints
# Define the constraints (position limits and draft value limit)
con <- reactive({
matrix(c(
# QB constraint
rv$players$QB,
# RB constraint
rv$players$RB,
# WR constraint
rv$players$WR,
# TE constraint
rv$players$TE,
# Draft value constraint
rv$players$DraftValue,
#Total players constraint
rv$players$Total
), ncol = nrow(rv$players), byrow = TRUE)
})
# Define the variables for the lp
dir <- c("<=", rep(">=",3),"<=","<=")
# Define initial 'const.rhs'
init_rhs <- reactive({
list(
QB = input$num_qb,
RB = input$num_rb,
WR = input$num_wr,
TE = input$num_te,
n_val = input$num_value,
n_players = input$num_players
)
})
# Define reactive 'const.rhs'
rhs = reactiveValues(const = list())
# Run once to get the initial values and set them to reactiveValues
# so they can be changed later
observeEvent(init_rhs(),{
rhs$const = init_rhs()
}, once = TRUE)
# Define the initial optimal lineup
initialLineup <- reactive({
result <- lp("max", obj, con(), dir, init_rhs(), all.bin = TRUE)
rv$players[result$solution == 1,]
})
# Define the function to run when the "update" button is pressed
observeEvent(input$update, {
# Remove player here
if(input$remove != "") {
removedPlayer <- input$remove
rv$players <- rv$players[rv$players$Player != removedPlayer,]
obj <- rv$players$FantasyPoints
}
# Draft player
if(input$draft_player != "") {
draftedPlayer <- input$draft_player
draftedPlayer_details <- rv$players[rv$players$Player == draftedPlayer,]
draftedPlayer_details$Drafted = "Yes"
rv$players <- rv$players[rv$players$Player != draftedPlayer,]
rv$draftedPlayers <- rbind(rv$draftedPlayers, draftedPlayer_details)
obj <- rv$players$FantasyPoints # missing object
# Subtract constraints: position and n_players by 1 and draft budget by the players 'DraftValue'
# Necessary so "result" outputs a table with the remaining positions left
# otherwise it will return an entirely new lineup
rhs$const = purrr::imap(rhs$const, function(cs, nm) {
if(nm == draftedPlayer_details$Position) {cs = cs - 1}
if(nm == "n_players") {cs = cs - 1}
if(nm == "n_val") {cs = cs - draftedPlayer_details$DraftValue}
return(cs)
})
}
# Update select inputs to remove players after "Update Lineup" is clicked
if(input$remove != "" || input$draft_player != "") {
updateSelectInput(session, inputId = "remove", choices = c("",rv$players), selected = "")
updateSelectInput(session, inputId = "draft_player", choices = c("",rv$players), selected = "")
}
# Define result with updated arguments
result <- lp("max", obj, con(), dir, rhs$const, all.bin = TRUE)
# Assign new table to the reactiveVal 'updateLineup'
updateLineup(rbind(rv$draftedPlayers, rv$players[result$solution == 1,]))
})
output$team <- renderTable({
if (input$update == 0) {
initialLineup()[, c("Player", "Position", "FantasyPoints", "DraftValue", "Drafted")]
} else {
updateLineup()[, c("Player", "Position", "FantasyPoints", "DraftValue", "Drafted")]
}
})
}
# Run the app
shinyApp(ui, server)
Related
Change the column width on a tableGrob using ttheme_minimal
I'm trying to build a table using tableGrob in gridExtra and can't figure out how to set the column with. I am using ttheme_minimal(base_size = 25) to set the size of the text, but I can't seem to find a way in that function to se the column widths. Below is the full code that I'm using: team_free_agents <- structure(list(Player = c("Adrian Amos", "Dean Lowry", "Marcedes Lewis", "Allen Lazard", "Robert Tonyan", "Jarran Reed", "Randall Cobb", "Rudy Ford", "Yosh Nijman", "Keisean Nixon", "Justin Hollins" ), Position = c("S", "IDL", "TE", "WR", "TE", "IDL", "WR", "S", "LT", "CB", "EDGE"), Age = c(30, 29, 39, 28, 29, 31, 33, 29, 27, 26, 27), `Snap %` = c("94.6%", "46.7%", "41.2%", "78.9%", "54%", "68.2%", "33.9%", "42.8%", "69.1%", "28%", "40.7%"), `Current APY` = c("$9,000,000", "$6,775,000", "$4,000,000", "$3,986,000", "$3,750,000", "$3,250,000", "$3,000,000", "$1,137,500", "$965,000", "$965,000", "$706,724" ), `Current Guarantees` = c("$12,000,000", "$6,000,000", "$2,100,000", "$0", "$1,000,000", "$1,865,000", "$0", "$0", "$0", "$0", "$306,896" ), `2022 PFF Grade` = c(54.2, 59.3, 65.6, 69, 57.7, 61.9, 70.1, 77.7, 63.1, 63.9, 54)), row.names = c(NA, -11L), class = c("tbl_df", "tbl", "data.frame")) tt1 <- gridExtra::ttheme_minimal(base_size = 25) fa_table <- gridExtra::tableGrob(team_free_agents, rows = NULL, theme = tt1) cowplot::draw_grob( fa_table, width = 90 ) Any help with this would be appreciated!
How do I write a function to plot a line graph for each factor in a dataframe?
I have a dataframe, the head of which looks like this: |trackName | week| sum| |:--------------------|----:|---:| |New Slang | 1| 493| |You're Somebody Else | 1| 300| |Mushaboom | 1| 297| |San Luis | 1| 296| I am interested in plotting a line graph for each of the 346 unique trackNames in the dataframe, with week on the x-axis and sum on the y-axis. To automate this process, I wrote the following function: charts <- function(df) { songs <- df lim <- nrow(songs) x <- 1 song_names <- as_tibble(unique(songs$trackName)) while (x <= lim) { song <- song_names[x, 1] plot.name <- paste(paste(song), "plot.png", sep = "_") songs %>% filter(trackName == paste(song[x, 1])) %>% ggplot(., aes(x = week, y = sum), group = 1) + geom_line() + labs( x = "Week", y = "Sum of Listens", title = paste("Week by Week Listening Interest for", song, sep = " "), subtitle = "Calculated by plotting the sum of percentages of the song listened per week, starting from first listen" ) + ggsave(plot.name, width = 20, height = 15, units = "cm") x <- x + 1 } } However when I run charts(df), only the following error shows up and then it quits: > charts(mini) geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic? > What am I doing wrong here and what does this error mean? A sample of the dataframe in DPUT format: structure(list(trackName = c("New Slang", "You're Somebody Else", "Mushaboom", "San Luis", "The Trapeze Swinger", "Flightless Bird, American Mouth", "tere bina - Acoustic", "Only for a Moment", "Upward Over the Mountain", "Virginia May", "Never to Be Forgotten Kinda Year", "Little Talks", "Jhak Maar Ke", "Big Rock Candy Mountain", "Sofia", "Aaoge Tum Kabhi", "Deathcab", "Dil Mere", "Choke", "Phir Le Aya Dil", "Lucille", "tere bina - Acoustic", "Dil Mere", "Only for a Moment", "This Is The Life", "San Luis", "Main Bola Hey!", "Choo Lo", "Yeh Zindagi Hai", "Aaftaab", "Never to Be Forgotten Kinda Year", "Khudi", "Flightless Bird, American Mouth", "Mere Bina", "Simple Song", "Dil Haare", "Dil Hi Toh Hai", "You're Somebody Else", "Sofia", "Who's Laughing Now", "Main Bola Hey!", "Lucille", "Eenie Meenie", "tere bina - Acoustic", "New Slang", "Aaftaab", "Mamma Mia", "July", "Yeh Zindagi Hai", "Someone You Loved"), week = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), sum = c(493, 300, 297, 296, 292, 234, 214, 200, 200, 197, 192, 187, 185, 181, 175, 172, 141, 119, 106, 103, 579, 574, 501, 462, 428, 378, 320, 307, 306, 301, 301, 300, 300, 300, 300, 300, 296, 294, 251, 242, 3534, 724, 696, 512, 479, 400, 302, 300, 300, 300)), row.names = c(NA, -50L), class = c("tbl_df", "tbl", "data.frame"))
How about using purrr::walk instead? library(tidyverse) library(hrbrthemes) walk(unique(songs$trackName), ~{ggsave(plot = ggplot(filter(songs, trackName == .x), aes(x = week, y = sum), group = 1) + geom_line(color = ft_cols$yellow) + labs(x = "Week", y = "Sum of Listens", title = paste("Week by Week Listening Interest for", .x, sep = " "), subtitle = "Calculated by plotting the sum of percentages of the song listened per week, starting from first listen") + theme_ft_rc(), file = paste0(.x,"_plot.png"), width = 20, height = 15, units = "cm")}) Note: the question was subsequently edited to remove the hrbrthemes package requirement.
You can split the dataset for each trackName and create a png file for it. library(tidyverse) charts <- function(df) { df %>% group_split(trackName) %>% map(~{ track <- first(.x$trackName) ggplot(.x, aes(x = factor(week), y = sum, group = 1)) + geom_line() + labs( x = "Week", y = "Sum of Listens", title = paste("Week by Week Listening Interest for", track), subtitle = "Calculated by plotting the sum of percentages of the song listened per week, starting from first listen" ) -> plt ggsave(paste0(track,'.png'), plt, width = 20, height = 15, units = "cm") }) } charts(songs)
How to correlate multiple subsets in R
How do I correlate 8 subsets separately against two different dependent variables? I keep getting the same correlation coefficient for the two different subsets (example below). Here is the input: with(subset(mydata2, PARTYID_Strength = 1), cor.test(PARTYID_Strength, mean.legit)) with(subset(mydata2, PARTYID_Strength = 1), cor.test(PARTYID_Strength, mean.leegauthor)) with(subset(mydata2, PARTYID_Strength = 2), cor.test(PARTYID_Strength, mean.legit)) with(subset(mydata2, PARTYID_Strength = 2), cor.test(PARTYID_Strength, mean.leegauthor)) Output (I get this for both PARTY_Strength = 1 and 2): Pearson's product-moment correlation data: PARTYID_Strength and mean.legit t = 3.1005, df = 607, p-value = 0.002022 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.0458644 0.2023031 sample estimates: cor 0.1248597 Pearson's product-moment correlation data: PARTYID_Strength and mean.leegauthor t = 2.8474, df = 607, p-value = 0.004557 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.03568431 0.19250344 sample estimates: cor 0.1148091 Sample data: > dput(head(mydata2, 10)) ``structure(list(PARTYID = c(1, 3, 1, 1, 1, 4, 3, 1, 1, 1), PARTYID_Other = c("NA", "NA", "NA", "NA", "NA", "Green", "NA", "NA", "NA", "NA"), PARTYID_Strength = c(1, 7, 1, 2, 1, 8, 1, 6, 1, 1), PARTYID_Strength_Other = c("NA", "NA", "NA", "NA", "NA", "Green", "NA", "NA", "NA", "NA"), THERM_Dem = c(80, 65, 85, 30, 76, 15, 55, 62, 90, 95), THERM_Rep = c(1, 45, 10, 5, 14, 14, 0, 4, 10, 3), Gender = c("Female", "Male", "Male", "Female", "Female", "Male", "Male", "Female", "Female", "Male" ), `MEAN Age` = c(29.5, 49.5, 29.5, 39.5, 29.5, 21, 39.5, 39.5, 29.5, 65), Age = c("25 - 34", "45 - 54", "25 - 34", "35 - 44", "25 - 34", "18 - 24", "35 - 44", "35 - 44", "25 - 34", "65+"), Ethnicity = c("White or Caucasian", "Asian or Asian American", "White or Caucasian", "White or Caucasian", "Hispanic or Latino", "White or Caucasian", "White or Caucasian", "White or Caucasian", "White or Caucasian", "White or Caucasian"), Ethnicity_Other = c("NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA"), States = c("Texas", "Texas", "Ohio", "Texas", "Puerto Rico", "New Hampshire", "South Carolina", "Texas", "Texas", "Texas"), Education = c("Master's degree", "Bachelor's degree in college (4-year)", "Bachelor's degree in college (4- year)", "Master's degree", "Master's degree", "Less than high school degree", "Some college but no degree", "Master's degree", "Master's degree", "Some college but no degree"), `MEAN Income` = c(30000, 140000, 150000, 60000, 80000, 30000, 30000, 120000, 150000, 60000 ), Income = c("Less than $30,000", "$130,001 to $150,000", "More than $150,000", "$50,001 to $70,000", "$70,001 to $90,000", "Less than $30,000", "Less than $30,000", "$110,001 to $130,000", "More than $150,000", "$50,001 to $70,000"), mean.partystrength = c(3.875, 2.875, 2.375, 3.5, 2.625, 3.125, 3.375, 3.125, 3.25, 3.625 ), mean.traitrep = c(2.5, 2.625, 2.25, 2.625, 2.75, 1.875, 2.75, 2.875, 2.75, 3), mean.traitdem = c(2.25, 2.625, 2.375, 2.75, 2.625, 2.125, 1.875, 3, 2, 2.5), mean.leegauthor = c(1, 2, 2, 4, 1, 4, 1, 1, 1, 1), mean.legit = c(1.71428571428571, 3.28571428571429, 2.42857142857143, 2.42857142857143, 2.14285714285714, 1.28571428571429, 1.42857142857143, 1.14285714285714, 2.14285714285714, 1.28571428571429)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))`` Thank you!
To run the tests, create a vector of the columns of interest and then sapply an anonymous function to each of them. fixed <- "PARTYID_Strength" cols <- c("mean.leegauthor", "mean.legit") cor_test_result <- sapply(cols, function(x){ fmla <- paste(fixed, x, sep = "+") fmla <- as.formula(paste("~", fmla)) cor.test(fmla, mydata2) }, simplify = FALSE) cor_test_result$mean.leegauthor # # Pearson's product-moment correlation # #data: PARTYID_Strength and mean.leegauthor #t = 1.4804, df = 8, p-value = 0.177 #alternative hypothesis: true correlation is not equal to 0 #95 percent confidence interval: # -0.2343269 0.8462610 #sample estimates: # cor #0.4637152
How to summarise (dplyr) user specified variables reactively in flexdashboard/shiny?
I am trying to develop a shiny dashboard app that is able to produce a bar graph for different outcome variables that can be selected by the user. To do so, I need to subset my data reactively to generate aggregate data frames. I am able to have the code below successfully filter my data reactively, but I am running into trouble when I try to use dplyr::summarise() reactively. Here is my data dput(head(df)) structure( list( geoid = c( "01001020200", "01001020300", "01001020700", "01001020802", "01001021000", "01001021100" ), state = c( "Alabama", "Alabama", "Alabama", "Alabama", "Alabama", "Alabama" ), county = c( "Autauga County", "Autauga County", "Autauga County", "Autauga County", "Autauga County", "Autauga County" ), ozzone = structure( c(1L, 1L, 2L, 1L, 1L, 1L), .Label = c("non.oz", "oz"), class = "factor" ), tract_type = c( "LICs", "Contiguous", "LICs", "Contiguous", "Contiguous", "LICs" ), investment_score_1_low_10_high = c(4, 6, 9, 10, 5, 6), socioeconomic_change_flag_1_yes_blank_no = c(0, 0, 0, 0, 0, 0), fips_county = c("01001", "01001", "01001", "01001", "01001", "01001"), total_empl = c(51809L, 51809L, 51809L, 51809L, 51809L, 51809L), total_payroll = c(338395L, 338395L, 338395L, 338395L, 338395L, 338395L), total_establishments = c(5090L, 5090L, 5090L, 5090L, 5090L, 5090L), largest_employer = c(72L, 72L, 72L, 72L, 72L, 72L), largest_employer_bypayroll = c(44L, 44L, 44L, 44L, 44L, 44L), trend_employee_change = c( 2735.60000000046, 2735.60000000046, 2735.60000000046, 2735.60000000046, 2735.60000000046, 2735.60000000046 ), trend_payroll_change = c( 23074.8000000037, 23074.8000000037, 23074.8000000037, 23074.8000000037, 23074.8000000037, 23074.8000000037 ), trend_establishment_change = c( 53.4000000000084, 53.4000000000084, 53.4000000000084, 53.4000000000084, 53.4000000000084, 53.4000000000084 ), damage_cost_weather_total = c(20000, 20000, 20000, 20000, 20000, 20000), deaths_weather_total = c(0L, 0L, 0L, 0L, 0L, 0L), medianrent = c(537, 633, 525, 680, 409, 303), vacancyrate = c( 0.108200455580866, 0.113652113652114, 0.0436681222707424, 0.0512166859791425, 0.229962546816479, 0.21030303030303 ), total_pop = c(503, 827, 900, 2989, 740, 813), undertwo_percent = c( 0.391650099403579, 0.351874244256348, 0.397777777777778, 0.17096018735363, 0.301351351351351, 0.263222632226322 ), mobility_rate = c( 0.133702166897188, 0.0737753882915173, 0.196514423076923, 0.172716680111141, 0.0641304347826087, 0.0681084570690769 ), unemploy_rate = c( 0.0176991150442478, 0.0273203592814371, 0.109881724532621, 0.0127906976744186, 0.0344982078853047, 0.0281910728269381 ), median_income = c(41287, 46806, 41250, 64439, 46607, 36450), renter_percent = c( 0.337653478854025, 0.310596310596311, 0.331877729257642, 0.268110942458949, 0.328686327077748, 0.365986394557823 ), blackaa_percent = c( 0.5451197053407, 0.264697193500739, 0.145906432748538, 0.152916262243007, 0.258583690987124, 0.530922930542341 ), hispanic_percent = c( 0.0105893186003683, 0.0803545051698671, 0.0400584795321637, 0.0137651107385511, 0.00822603719599428, 0.00666032350142721 ), transit_score_mean = c(0, 0, 0, 0, 0, 0), life_expectancy = c(75.67, 75.67, 75.67, 75.67, 75.67, 75.67), trend_life_expectancy = c(5.1, 5.1, 5.1, 5.1, 5.1, 5.1), median_monthly_housing_costs = c(885, 885, 885, 885, 885, 885), pestilence_2018 = c(2, 2, 2, 2, 2, 2), total_pop_county = c(6772, 6772, 6772, 6772, 6772, 6772), deaths_weather_pop = c(0, 0, 0, 0, 0, 0), cost_weather_pop = c( 2.95333727111636, 2.95333727111636, 2.95333727111636, 2.95333727111636, 2.95333727111636, 2.95333727111636 ), Male_HSgrad = c(75, 68, 211, 189, 97, 42), Male_SomeCollege = c(28, 18, 51, 111, 74, 38), Male_AssocDeg = c(4, 6, 0, 63, 0, 21), Male_BachDeg = c(7, 9, 0, 11, 0, 9), Male_GradDeg = c(0, 0, 0, 29, 6, 0), MaleEduAboveHS = c(114, 101, 262, 403, 177, 110), Total_Male18.24 = c(145, 123, 285, 455, 202, 110), MaleEduHSAbove_pop = c( 0.786206896551724, 0.821138211382114, 0.919298245614035, 0.885714285714286, 0.876237623762376, 1 ), Female_HSgrad = c(11, 60, 87, 156, 23, 83), Female_SomeCollege = c(22, 25, 13, 47, 54, 65), Female_AssocDeg = c(0, 0, 20, 82, 0, 0), Female_BachDeg = c(5, 26, 0, 19, 0, 11), Female_GradDeg = c(5, 16, 0, 0, 0, 0), FemaleEduAboveHS = c(43, 127, 120, 304, 77, 159), Total_Female18.24 = c(53, 127, 192, 581, 92, 198), FemaleEduHSAbove_pop = c( 0.811320754716981, 1, 0.625, 0.523235800344234, 0.83695652173913, 0.803030303030303 ) ), row.names = c(NA, 6L), class = "data.frame" ) Here is my code #List of potential outcome variables to be plotted variables <- c("total_empl", "total_payroll", "total_establishments", "largest_employer", "largest_employer_bypayroll", "trend_employee_change", "trend_payroll_change", "trend_establishment_change", "damage_cost_weather_total", "deaths_weather_total", "medianrent", "vacancyrate", "total_pop", "undertwo_percent", "mobility_rate", "unemploy_rate", "median_income", "renter_percent", "blackaa_percent", "hispanic_percent", "median_monthly_housing_costs", "MaleEduAboveHS_pop", "FemaleEduHSAbove_pop") # Define inputs selectInput('state_name', label = 'Select a state', choices = lookup) selectInput('DV', label = 'Outcome Measure', choices = variables) #Filter data based on the State and outcome measure the user would like to investigate. bar <- reactive({ st <- df %>% filter(state == input$state_name) bp <- st %>% group_by(tract_type) %>% summarise(Outcome = mean(st[,input$DV])) return(bp) }) bar UPDATE Right now, this code successfully filters the data by the input$state_name, but there is an issue with the calculation of means. The result is this: # A tibble: 2 x 2 tract_type Outcome <chr> <dbl> 1 Contiguous 468296. 2 LICs 468296. As you can see, the means that are calculated are identical. In fact, these values correspond to the grand average mean for whichever variable is chosen for input$DV. Therefore, the filtered st data is not being successfully grouped into the two levels of tract_type.
I see what you are trying to do. The difference is that in your reactive part you try to calculate the mean of a string, which won't work. What you want to do is summarise one of the columns in df by providing the name In the following example, I specify the summarising variable manually. Note that investment_score_1_low_10_high does not have quotes. investment_score_1_low_10_high is what is called a symbol in R. st <- df %>% filter(state == "Alabama") %>% group_by(tract_type) %>% summarise(Outcome = mean(investment_score_1_low_10_high)) But I think this should work: bar <- reactive({ # Create a symbol from string. mean_variable <- sym(input$DV) bp <- df %>% filter(state == input$state_name) %>% group_by(tract_type) %>% summarise(Outcome = mean(!! mean_variable, na.rm = TRUE)) return(bp) }) Extra information about the use of !! and what it does can be found here: Here And even better with examples Here
Solution derived by #dylanvanw bar <- reactive({ # Create a symbol from string. mean_variable <- sym(input$DV) bp <- df %>% filter(state == input$state_name) %>% group_by(tract_type) %>% summarise(Outcome = mean(!! mean_variable, na.rm = TRUE)) return(bp) })
Selectively apply custom function based on criteria
I am working with this dataframe: structure(list(year = c("2012", "2016", "2012", "2016"), month = c("12", "12", "12", "12"), company = c("ALSN", "ALSN", "DAN", "DAN"), Revenue = c(2141.8, 1840.2, 7224, 5826), `Cost of Goods Sold` = c(1187.5, 976, 6250, 4982), `Gross Profit` = c(954.3, 864.2, 974, 844 ), `Gross Margin %` = c(44.56, 46.96, 13.48, 14.49), `Selling, General, & Admin. Expense` = c(419, 323.9, 424, 406), `Impairment Of Capital Assets` = c(0, 0, 2, 0), Advertising = c(1, 1, 1, 1), `Research & Development` = c(115.1, 88.8, 0, 0), `Restructuring And Mergern Acquisition` = c(0, 0, 47, 0), `Other Operating Expense` = c(-5.68434188608e-14, 1.13686837722e-13, 121, 8), `Operating Income` = c(420.2, 451.5, 429, 430), `Operating Margin %` = c(19.62, 24.54, 5.94, 7.38), `Interest Income` = c(0.9, 0.7, 24, 13), `Interest Expense` = c(-152.1, -101.6, -84, -113), `Net Interest Income` = c(-151.2, -100.9, -60, -100), `Other Income (Expense)` = c(-52.8, -9.3, -5, -115), `Non Operating Income` = c(-52.8, -9.3, -5, -115), `Other Income (Minority Interest)` = c(0, 0, -15, -13), `Gain on Sale of Security` = c(-1.3, -0.8, 0, 7), `Write Off` = c(1, 1, 1, 1), `Pre-Tax Income` = c(216.2, 341.3, 364, 215), `Tax Provision` = c(298, -126.4, -51, 424 ), `Tax Rate %` = c(-137.84, 37.03, 14.01, -197.21), `Net Income (Continuing Operations)` = c(514.2, 214.9, 315, 653), `Net Income (Discontinued Operations)` = c(0, 0, 0, 0), `Net Income` = c(514.2, 214.9, 300, 640), `Net Margin %` = c(24.01, 11.68, 4.15, 10.99), `Preferred Dividends` = c(0, 0, 31, 0), `EPS (Basic)` = c(2.83, 1.28, 1.82, 4.38), `EPS (Diluted)` = c(2.76, 1.27, 1.4, 4.36), `Shares Outstanding (Diluted Average)` = c(186.2, 168.8, 214.7, 146.8), `Depreciation, Depletion and Amortization` = c(252.5, 175.9, 277, 182), EBITDA = c(620.8, 618.8, 725, 510)), .Names = c("year", "month", "company", "Revenue", "Cost of Goods Sold", "Gross Profit", "Gross Margin %", "Selling, General, & Admin. Expense", "Impairment Of Capital Assets", "Advertising", "Research & Development", "Restructuring And Mergern Acquisition", "Other Operating Expense", "Operating Income", "Operating Margin %", "Interest Income", "Interest Expense", "Net Interest Income", "Other Income (Expense)", "Non Operating Income", "Other Income (Minority Interest)", "Gain on Sale of Security", "Write Off", "Pre-Tax Income", "Tax Provision", "Tax Rate %", "Net Income (Continuing Operations)", "Net Income (Discontinued Operations)", "Net Income", "Net Margin %", "Preferred Dividends", "EPS (Basic)", "EPS (Diluted)", "Shares Outstanding (Diluted Average)", "Depreciation, Depletion and Amortization", "EBITDA"), row.names = c(NA, 4L), class = "data.frame") Constants: startDate <- "2012-01-01" endDate <- "2016-12-31" What I want: to create a function that applies a custom function to all numeric columns. I am trying to calculate CAGRs. The CAGR formula is as such: ((End Value / Beginning Value)^(1/number of years)-1) So as you can see, I need for each column to be able to find the correct end value and beginning value. My function right now is this: cagr <- function(startval,endval,x,y,years){ return(((endval[x == year(endDate)]/startval[y == year(startDate)])^(1/(years-1)))-1) } cagrNew <- function(df,colum,x,y,years){ colum <- quo(colum) x <- quo(x) y <- quo(y) out <- df %>% group_by(!!company) %>% summarise(xxxx = cagr(!!colum[!!x == year(endDate)],!!colum[!!y == year(startDate)],!!x,!!y,numYears)) return(out) } When I run the above function (cagrNEW), I get this error: Error in `[.formula`(colum, !(!x == year(endDate))) : attempt to set an attribute on NULL My desired output: Company RevenueCagr Cost of Goods Sold CAGR .... ALSN .5% .3% DAN .3% .2%
I haven't repeated the data above, to conserve space. Convert to tibble and assign. # df <- as_tibble(...) library(tidyverse) library(scales) #< For percentage formatting start_year <- 2012 end_year <- 2016 df %>% filter(year %in% c(start_year, end_year)) %>% group_by(company) %>% arrange(desc(year), .by_group = TRUE) %>% summarise_if(is.double, funs(CAGR = percent( (.[[1]]/.[[2]])^ (1/(end_year - start_year)) - 1) ) ) # CAGR = ((End Value / Beginning Value)^(1/number of years)-1) #Checksum: ALSN Company, Revenue # (End Value / Beginning Value)^((1/number of years))-1 percent(( (1840 / 2142) ^ (1/(2016-2012)) - 1)) #> [1] "-3.73%"