I'm making an app that will predict an NFL running back's number of rush attempts and rush yards AFTER a season of 1800+ rush yards. I use slider inputs for the # of rushing yards and attempts, which gets run through lm() and predict() and returns estimates for next year's attempts and rush yards (I know it's not a very good predictor at all, but this is just an exercise in making a Shiny app). Here's the data from my excel file and then the code.
Player Yr. Team Attempts Att.Next.Yr Yards Yards.Next.Yr YPC YPC.Next.Yr
1 Adrian Peterson 2012 MIN 348 279 2097 1266 6.0 4.5
2 Chris Johnson 2009 TEN 358 316 2006 1364 5.6 4.3
3 LaDainian Tomlinson 2006 SD 348 315 1815 1474 5.2 4.7
4 Shaun Alexander 2005 SEA 370 252 1880 896 5.1 3.6
5 Tiki Barber 2005 NYG 357 327 1860 1662 5.2 5.1
6 Jamal Lewis 2003 BAL 387 235 2066 1006 5.3 4.3
7 Ahman Green 2003 GB 355 259 1883 1163 5.3 4.5
8 Ricky Williams 2002 MIA 383 392 1853 1372 4.8 3.5
9 Terrell Davis 1998 DEN 392 67 2008 211 5.1 3.1
10 Jamal Anderson 1998 ATL 410 19 1846 59 4.5 3.1
11 Barry Sanders 1997 DET 335 343 2053 1491 6.1 4.3
12 Barry Sanders 1994 DET 331 314 1883 1500 5.7 4.8
13 Eric Dickerson 1986 RAM 404 60 1821 277 4.5 4.6
14 Eric Dickerson 1984 RAM 379 292 2105 1234 5.6 4.2
15 Eric Dickerson 1983 RAM 390 379 1808 2105 4.6 5.6
16 Earl Campbell 1980 HOU 373 361 1934 1376 5.2 3.8
17 Walter Payton 1977 CHI 339 333 1852 1395 5.5 4.2
18 O.J. Simpson 1975 BUF 329 290 1817 1503 5.5 5.2
19 O.J. Simpson 1973 BUF 332 270 2003 1125 6.0 4.2
20 Jim Brown 1963 CLE 291 280 1863 1446 6.4 5.2
Server.R
# server.R
library(UsingR)
library(xlsx)
rawdata <- read.xlsx("RushingYards.xlsx", sheetIndex=1)
data <- rawdata[c(2:21),]
rownames(data) <- NULL
# Att
set.seed(1)
fitAtt <- lm(Att.Next.Yr ~ Yards + Attempts, data)
# Yds
set.seed(1)
fitYds <- lm(Yards.Next.Yr ~ Yards + Attempts, data)
shinyServer(
function(input, output) {
output$newPlot <- renderPlot({
iYards <- input$Yards
iAttempts <- input$Attempts
test <- data.frame(iYards,iAttempts)
names(test) <- c("Yards", "Attempts")
predictAtt <- predict(fitAtt, test)
predictYds <- predict(fitYds, test)
qplot(data=data, x=Attempts, y=Yards) +
geom_point(aes(x=predictAtt, y=predictYds, color="Estimate"))
output$renderYds <- renderPrint({predictYds})
output$renderAtt <- renderPrint({predictAtt})
})
}
)
UI.R
# ui.R
shinyUI(pageWithSidebar(
headerPanel("Rushing Projections"),
sidebarPanel(
sliderInput('Yards', 'How many yards rushed for this season',
value=1700, min=1500, max=2500, step=25,),
sliderInput('Attempts', 'How many attempts this season',
value=350, min=250, max=450, step=5,),
submitButton('Submit')
),
mainPanel(
plotOutput('newPlot'),
h3('Predicted rushing yards next year: '),
verbatimTextOutput("renderYds"),
h3('Predict attempts next year: '),
verbatimTextOutput("renderAtt")
)
))
The problem I'm having is I can't seem to output BOTH the plot (next year's estimates plotted in red against historical performances for running backs > 1800 rush yards) and the text of next year's estimated rushing yards and attempts at the same time. I can get one or the other to show up depending on where I put those statements. If I put
output$renderYds <- renderPrint({predictYds})
output$renderAtt <- renderPrint({predictAtt})
outside of the output$newPlot (but still inside of function(input, output)) line I can get the plot to show up and the point for next year's estimates changes as the input is changed but I get error messages of
object 'predictYds' not found' and object 'predictAtt' not found for the text. If I put those two lines inside of the function(input, output) line (as I have in the code above) then those two text numbers show up with the correct value but the plot doesn't generate.
Can anyone help with this please?
I changed the structure of Server.R and now it works.
shinyServer(function(input, output) {
predictYds <- function(Y, A){
test <- data.frame(Y, A)
names(test) <- c("Yards", "Attempts")
predict(fitYds, test)
}
predictAtt <- function(Y, A){
test <- data.frame(Y, A)
names(test) <- c("Yards", "Attempts")
predict(fitAtt, test)
}
output$newPlot <- renderPlot({
newYards <- predictYds(input$Yards, input$Attempts)
newAttempts <- predictAtt(input$Yards, input$Attempts)
qplot(data=data, x=Attempts, y=Yards) +
geom_point(aes(x=newAttempts, y=newYards, color="Estimate"))
})
output$renderYds <- renderPrint({predictYds(input$Yards, input$Attempts)})
output$renderAtt <- renderPrint({predictAtt(input$Yards, input$Attempts)})
}
)
Basically PredictYds and PredictAtt were rewritten as normal functions called inside render functions using input variables.
Related
I am using this code to loop through multiple url's to scrape data. The code works fine until it comes to a date that has missing data. This is the error message that pops up:
Error in data.frame(away, home, away1H, home1H, awayPinnacle, homePinnacle) :
arguments imply differing number of rows: 7, 8
I am very new to coding and could not figure out how to make it keep scraping despite the missing data.
library(rvest)
library(dplyr)
get_data <- function(date) {
# Specifying URL
url <- paste0('https://classic.sportsbookreview.com/betting-odds/nba-basketball/money-line/1st-half/?date=', date)
# Reading the HTML code from website
oddspage <- read_html(url)
# Using CSS selectors to scrape away teams
awayHtml <- html_nodes(oddspage,'.eventLine-value:nth-child(1) a')
#Using CSS selectors to scrape 1Q scores
away1QHtml <- html_nodes(oddspage,'.current-score+ .first')
away1Q <- html_text(away1QHtml)
away1Q <- as.numeric(away1Q)
home1QHtml <- html_nodes(oddspage,'.score-periods+ .score-periods .current-score+ .period')
home1Q <- html_text(home1QHtml)
home1Q <- as.numeric(home1Q)
#Using CSS selectors to scrape 2Q scores
away2QHtml <- html_nodes(oddspage,'.first:nth-child(3)')
away2Q <- html_text(away2QHtml)
away2Q <- as.numeric(away2Q)
home2QHtml <- html_nodes(oddspage,'.score-periods+ .score-periods .period:nth-child(3)')
home2Q <- html_text(home2QHtml)
home2Q <- as.numeric(home2Q)
#Creating First Half Scores
away1H <- away1Q + away2Q
home1H <- home1Q + home2Q
#Using CSS selectors to scrape scores
awayScoreHtml <- html_nodes(oddspage,'.first.total')
awayScore <- html_text(awayScoreHtml)
awayScore <- as.numeric(awayScore)
homeScoreHtml <- html_nodes(oddspage, '.score-periods+ .score-periods .total')
homeScore <- html_text(homeScoreHtml)
homeScore <- as.numeric(homeScore)
# Converting away data to text
away <- html_text(awayHtml)
# Using CSS selectors to scrape home teams
homeHtml <- html_nodes(oddspage,'.eventLine-value+ .eventLine-value a')
# Converting home data to text
home <- html_text(homeHtml)
# Using CSS selectors to scrape Away Odds
awayPinnacleHtml <- html_nodes(oddspage,'.eventLine-consensus+ .eventLine-book .eventLine-book-value:nth-child(1) b')
# Converting Away Odds to Text
awayPinnacle <- html_text(awayPinnacleHtml)
# Converting Away Odds to numeric
awayPinnacle <- as.numeric(awayPinnacle)
# Using CSS selectors to scrape Pinnacle Home Odds
homePinnacleHtml <- html_nodes(oddspage,'.eventLine-consensus+ .eventLine-book .eventLine-book-value+ .eventLine-book-value b')
# Converting Home Odds to Text
homePinnacle <- html_text(homePinnacleHtml)
# Converting Home Odds to Numeric
homePinnacle <- as.numeric(homePinnacle)
# Create Data Frame
df <- data.frame(away,home,away1H,home1H,awayPinnacle,homePinnacle)
}
date_vec <- sprintf('201902%02d', 02:06)
all_data <- do.call(rbind, lapply(date_vec, get_data))
View(all_data)
I'd recommending purrr::map() instead of lapply. Then you can wrap your call to get_data() with possibly(), which is a nice way to catch errors and keep going.
library(purrr)
map_dfr(date_vec, possibly(get_data, otherwise = data.frame()))
Output:
away home away1H home1H awayPinnacle homePinnacle
1 L.A. Clippers Detroit 47 65 116 -131
2 Milwaukee Washington 73 50 -181 159
3 Chicago Charlotte 60 51 192 -220
4 Brooklyn Orlando 48 44 121 -137
5 Indiana Miami 53 54 117 -133
6 Dallas Cleveland 58 55 -159 140
7 L.A. Lakers Golden State 58 63 513 -651
8 New Orleans San Antonio 50 63 298 -352
9 Denver Minnesota 61 64 107 -121
10 Houston Utah 63 50 186 -213
11 Atlanta Phoenix 58 57 110 -125
12 Philadelphia Sacramento 52 62 -139 123
13 Memphis New York 42 41 -129 114
14 Oklahoma City Boston 58 66 137 -156
15 L.A. Clippers Toronto 51 65 228 -263
16 Atlanta Washington 61 57 172 -196
17 Denver Detroit 55 68 -112 -101
18 Milwaukee Brooklyn 51 42 -211 184
19 Indiana New Orleans 53 50 -143 127
20 Houston Phoenix 63 57 -256 222
21 San Antonio Sacramento 59 63 -124 110
I want to create a weighted average of the baseball statistic WAR from 2017 to 2019.
The Averages would go as following:
2019: 57.14%
2018: 28.57%
2017: 14.29%
However some players only played in 2018 and 2019, some having played in 2019 and 2017.
If they've only played in two years it would be 67/33, and only one year would be 100% obviously.
I was wondering if there was an easy way to do this.
My data set looks like this
Name Season G PA HR BB_pct K_pct ISO wOBA wRC_plus Def WAR
337 A.J. Pollock 2017 112 466 14 7.5 15.2 0.205 0.340 103 2.6 2.2
357 A.J. Pollock 2018 113 460 21 6.7 21.7 0.228 0.338 111 0.9 2.6
191 Aaron Altherr 2017 107 412 19 7.8 25.2 0.245 0.359 120 -7.9 1.4
162 Aaron Hicks 2017 88 361 15 14.1 18.6 0.209 0.363 128 6.4 3.4
186 Aaron Hicks 2018 137 581 27 15.5 19.1 0.219 0.360 129 2.3 5.0
464 Aaron Hicks 2019 59 255 12 12.2 28.2 0.208 0.325 102 1.3 1.1
And the years vary from person to person, but was wondering if anyone had a way to do this weighted average dependent on the years they played. I also dont want any only 2017-ers if that make sense.
I guess, there is an easy way of doing your task. Unfortunately my approach is a little bit more complex. I'm using dplyr and purr.
First I put those weights into a list:
one_year <- 1
two_years <- c(2/3, 1/3)
three_years <- c(4/7, 3/7, 1/7)
weights <- list(one_year, two_years, three_years)
Next I split the datset into a list by the number of seasons each player took part:
df %>%
group_by(Name) %>%
mutate(n=n()) %>%
arrange(n) %>%
ungroup() %>%
group_split(n) -> my_list
Now I define a function that calculates the average using the weights:
WAR_average <- function(i) {my_list[[i]] %>%
group_by(Name) %>%
mutate(WAR_average = sum(WAR * weights[[i]]))}
And finally I apply the function WAR_average on my_list and filter/select the data:
my_list %>%
seq_along() %>%
lapply(WAR_average) %>% # apply function
reduce(rbind) %>% # bind the dataframes into one df
filter(Season != 2017 | n != 1) %>% # filter players only active in 2017
select(Name, WAR_average) %>% # select player and war_average
distinct() # remove duplicates
This whole process returns
# A tibble: 2 x 2
# Groups: Name [2]
Name WAR_average
<chr> <dbl>
1 A.J. Pollock 2.33
2 Aaron Hicks 4.24
Say I have an R data.table DT which has a list of returns:
Date Return
2016-01-01 -0.01
2016-01-02 0.022
2016-01-03 0.1111
2016-01-04 -0.006
...
I want to do a rolling multi regression of the previous N observations of Return predicting the next Return over some window K. E.g. Over the last K = 120 days do a regression of the last N = 14 observations to predict the next observation. Once I have this regression I want to use the predict function to get a prediction for each row based on the regression. In pseudocode it would be something like:
DT[, Prediction := predict(lm(Return[prev K - N -1] ~ Return[N observations prev for each observation]), Return[N observations previous for this observation])]
To be clear i want to do a multi regression so if N was 3 it would be:
lm(Return ~ Return[-1] + Return[-2] + Return[-3]) ## where the negatives are the prev rows
How do I write this (as efficiently as possible).
Thanks
If I understand correctly you want a quarterly auto-regression.
There's a related thread on time-series with data.table here.
You can setup a rolling date in data.table like this (see the link above for more context):
#Example for quarterly data
quarterly[, rollDate:=leftBound]
storeData[, rollDate:=date]
setkey(quarterly,"rollDate")
setkey(storeData,"rollDate")
Since you only provided a few rows of example data, I extended the series through 2019 and made up random return values.
First get your data setup:
require(forecast)
require(xts)
DT <- read.table(con<- file ( "clipboard"))
dput(DT) # the dput was too long to display here
DT[,1] <- as.POSIXct(strptime(DT[,1], "%m/%d/%Y"))
DT[,2] <- as.double(DT[,2])
dat <- xts(DT$V2,DT$V1, order.by = DT$V1)
x.ts <- to.quarterly(dat) # 120 days
dat.Open dat.High dat.Low dat.Close
2016 Q1 1292 1292 1 698
2016 Q2 138 1290 3 239
2016 Q3 451 1285 5 780
2016 Q4 355 1243 27 1193
2017 Q1 878 1279 4 687
2017 Q2 794 1283 12 411
2017 Q3 858 1256 9 1222
2017 Q4 219 1282 15 117
2018 Q1 554 1286 32 432
2018 Q2 630 1272 30 46
2018 Q3 310 1288 18 979
2019 Q1 143 1291 10 184
2019 Q2 250 1289 8 441
2019 Q3 110 1220 23 571
Then you can do a rolling ARIMA model with or without re-estimation like this:
fit <- auto.arima(x.ts)
order <- arimaorder(fit)
fcmat <- matrix(0, nrow=nrow(x), ncol=1)
n <- nrow(x)
for(i in 1:n)
{
x <- window(x.ts, end=2017.99 + (i-1)/4)
refit <- Arima(x, order=order[1:3], seasonal=order[4:6])
fcmat[i,] <- forecast(refit, h=h)$mean
}
Here's a good related resource with several examples of different ways you might construct this: http://robjhyndman.com/hyndsight/rolling-forecasts/
You have to have the lags in the columns anyway, so I if i understand you correctly you can do something like this, say for a lag of 3:
setkey(DT,date)
lag_max<-3
for(i in 1:lag_max){
set(DT,NULL,paste0("lag",i),shift(DT[["return"]],1L,type="lag"))
}
DT[, prediction := lm(return~lag1+lag2+lag3)[["fitted.values"]]]
My data set includes various observations at different stages throughout the year.
year when samples were collected.
site location of measurement
Class physical stage during r of measurement
date date of measurement
Julian Julian date
The final measurements usually occur in the early part of the new year, which is the summer time in the southern hemisphere. (e.g. summer is winter, spring is fall).
year site Class date Julian
1 2009 10C Early 2008-09-15 259
2 2009 10C L2 2008-09-29 273
3 2009 10C L3 2008-12-15 350
4 2010 10C Early 2009-08-31 243
5 2010 10C L2 2009-09-14 257
6 2010 10C L3 2009-12-11 345
7 2012 10C Early 2011-08-23 235
8 2012 10C L2 2011-09-22 265
9 2012 10C L3 2011-12-03 337
10 2012 10C LSample 2012-03-26 86
11 2013 10C Early 2012-09-07 251
12 2013 10C L2 2012-09-30 274
13 2013 10C L3 2012-12-17 352
14 2014 10C Early 2013-09-02 245
15 2014 10C L2 2013-09-16 259
16 2014 10C L3 2013-12-16 350
17 2014 10C LMid 2014-01-07 7
18 2015 10C Early 2014-09-08 251
19 2015 10C L2 2014-09-30 273
20 2015 10C L3 2014-12-01 335
I am having a difficult time converting/reassigning the Julian start date to July 1st instead of January 1st. The dot plot below illustrates the final sampling that occurs at the beginning of the year (February-March).
The chron package has an option to reorder the origin but I cannot get it to work properly with my data.
library(chron)
library(dplyr)
data.date <- data %>%
mutate(July.Julian = chron(date,format = c(dates = "ymd"), options(chron.origin = c(month=7, day=1, year=2008))))
Error in chron(c("2008-09-15", "2008-09-29", "2008-12-15", "2009-08-31", :
misspecified chron format(s) length
or
July.Julian = chron(data$date, format = c(dates = "ymd"), options(chron.origin = c(month=7, day=1, year=2008)))
Error in chron(c("2008-09-15", "2008-09-29", "2008-12-15", "2009-08-31", :
misspecified chron format(s) length
I am trying to start the Julian date as 1 instead of 182.
Thoughts or suggestions are welcome.
Assuming that July.Julian is supposed to be Julian days past July 1st:
transform(date.data, July.Julian = as.chron(sprintf("%d-07-01", year)) + Julian)
or
date.data %>% mutate(July.Julian = as.chron(sprintf("%d-07-01", year)) + Julian)
Note that one does not actually need chron here. Just replace as.chron with as.Date and either of these work.
When working with data (e.g., in data.frame) the user can control displaying digits by using
options(digits=3)
and listing the data.frame like this.
ttf.all
When the user needs to paste the data in Excell like this
write.table(ttf.all, 'clipboard', sep='\t',row.names=F)
The digits parameter is ignored and numbers are not rounded.
See nice output
> ttf.all
year V1.x.x V1.y.x ratio1 V1.x.y V1.y.y ratioR V1.x.x V1.y.x ratioAL V1.x.y V1.y.y ratioRL
1 2006 227 645 35.2 67 645 10.4 150 645 23.3 53 645 8.22
2 2007 639 1645 38.8 292 1645 17.8 384 1645 23.3 137 1645 8.33
3 2008 1531 3150 48.6 982 3150 31.2 755 3150 24.0 235 3150 7.46
4 2009 1625 3467 46.9 1026 3467 29.6 779 3467 22.5 222 3467 6.40
But what is in excel (clipboard) is not rounded. How to control in in write.table()?
You can use the function format() as in:
write.table(format(ttf.all, digits=2), 'clipboard', sep='\t',row.names=F)
format() is a generic function that has methods for many classes, including data.frames. Unlike round(), it won't throw an error if your dataframe is not all numeric. For more details on the formatting options, see the help file via ?format
Adding a solution for data frame having mixed character and numeric columns. We first use mutate_if to select numeric columns then apply the round() function to them.
# install.packages('dplyr', dependencies = TRUE)
library(dplyr)
df <- read.table(text = "id year V1.x.x V1.y.x ratio1
a 2006 227.11111 645.11111 35.22222
b 2007 639.11111 1645.11111 38.22222
c 2008 1531.11111 3150.11111 48.22222
d 2009 1625.11111 3467.11111 46.22222",
header = TRUE, stringsAsFactors = FALSE)
df %>%
mutate_if(is.numeric, round, digits = 2)
#> id year V1.x.x V1.y.x ratio1
#> 1 a 2006 227.11 645.11 35.22
#> 2 b 2007 639.11 1645.11 38.22
#> 3 c 2008 1531.11 3150.11 48.22
#> 4 d 2009 1625.11 3467.11 46.22
### dplyr v1.0.0+
df %>%
mutate(across(where(is.numeric), ~ round(., digits = 2)))
#> id year V1.x.x V1.y.x ratio1
#> 1 a 2006 227.11 645.11 35.22
#> 2 b 2007 639.11 1645.11 38.22
#> 3 c 2008 1531.11 3150.11 48.22
#> 4 d 2009 1625.11 3467.11 46.22
Created on 2019-03-17 by the reprex package (v0.2.1.9000)