plot graphs in R while showing all the x axis values - r

I'm trying to plot a graph with my data.
My code for that is
plot(birthRate$country_code, birthRate$yr2014, main = "Birth Rate by Countries 2014")
My out put is like this:
But I want to show all values in x axis.
dput(birthRate):
structure(list(series_code = structure(c(21L, 21L, 21L, 21L,
21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L,
21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L), .Label = c("NY.GNP.PCAP.CD",
"SE.PRM.ENRR", "SE.SEC.ENRR", "SE.TER.ENRR", "SE.TER.ENRR.FE",
"SH.ALC.PCAP.LI", "SH.DTH.COMM.ZS", "SH.DTH.INJR.ZS", "SH.DTH.NCOM.ZS",
"SH.IMM.IBCG", "SH.STA.MMRT.NE", "SH.STA.TRAF.P5", "SH.XPD.PCAP",
"SH.XPD.PRIV.ZS", "SH.XPD.PUBL.ZS", "SH.XPD.TOTL.ZS", "SL.UEM.TOTL.FE.ZS",
"SL.UEM.TOTL.MA.ZS", "SL.UEM.TOTL.ZS", "SP.ADO.TFRT", "SP.DYN.CBRT.IN",
"SP.DYN.CDRT.IN", "SP.DYN.LE00.FE.IN", "SP.DYN.LE00.IN",
"SP.DYN.LE00.MA.IN",
"SP.DYN.TFRT.IN"), class = "factor"), country_name = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 14L, 15L, 17L,
19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 28L, 29L), .Label = c("Australia",
"Brunei Darussalam", "Cambodia", "China", "Fiji", "Indonesia",
"Japan", "Kiribati", "Korea, Dem. People’s Rep.", "Korea, Rep.",
"Lao PDR", "Malaysia", "Marshall Islands", "Micronesia, Fed. Sts.",
"Mongolia", "Nauru", "New Zealand", "Palau", "Papua New Guinea",
"Philippines", "Samoa", "Singapore", "Solomon Islands", "Thailand",
"Timor-Leste", "Tonga", "Tuvalu", "Vanuatu", "Vietnam"), class = "factor"),
country_code = structure(c(1L, 2L, 8L, 3L, 4L, 6L, 7L, 9L,
20L, 10L, 11L, 14L, 5L, 13L, 16L, 19L, 17L, 29L, 21L, 22L,
23L, 24L, 25L, 28L, 27L), .Label = c("AUS", "BRN", "CHN",
"FJI", "FSM", "IDN", "JPN", "KHM", "KIR", "KOR", "LAO", "MHL",
"MNG", "MYS", "NRU", "NZL", "PHL", "PLW", "PNG", "PRK", "SGP",
"SLB", "THA", "TLS", "TON", "TUV", "VNM", "VUT", "WSM"), class = "factor"),
yr2001 = c(12.7, 20.913, 27.327, 13.38, 24.41, 21.486, 9.3,
30.228, 17.414, 11.6, 30.999, 21.445, 29.21, 19.035, 14.36,
34.396, 29.301, 30.269, 11.8, 35.403, 14.025, 41.441, 28.365,
31.84, 17.13), yr2002 = c(12.8, 20.137, 26.793, 12.86, 24.103,
21.49, 9.3, 29.965, 16.92, 10.2, 30.287, 20.39, 28.453, 19.001,
13.67, 33.95, 28.892, 29.991, 11.4, 35.226, 13.653, 40.428,
28.468, 31.219, 16.921), yr2003 = c(12.6, 19.522, 26.44,
12.41, 23.804, 21.5, 9.2, 29.775, 16.431, 10.2, 29.753, 19.435,
27.669, 19.209, 13.94, 33.49, 28.404, 29.778, 10.5, 35.061,
13.32, 39.726, 28.565, 30.597, 16.839), yr2004 = c(12.3,
19.065, 26.24, 12.29, 23.508, 21.499, 8.6936, 29.647, 15.961,
9.8, 29.38, 18.62, 26.886, 19.627, 14.2, 33.03, 27.845, 29.624,
10.3, 34.889, 13.025, 39.368, 28.624, 29.993, 16.848), yr2005 = c(12.8,
18.738, 26.145, 12.4, 23.208, 21.476, 8.4133, 29.572, 15.532,
8.9, 29.134, 17.971, 26.139, 20.223, 13.96, 32.575, 27.238,
29.499, 10.2, 34.68, 12.764, 39.326, 28.611, 29.427, 16.919
), yr2006 = c(12.9, 18.499, 26.098, 12.09, 22.901, 21.429,
8.65, 29.537, 15.166, 9.2, 28.966, 17.498, 25.461, 20.959,
14.14, 32.121, 26.619, 29.355, 10.3, 34.409, 12.533, 39.509,
28.499, 28.92, 17.03), yr2007 = c(14.1, 18.292, 26.043, 12.1,
22.586, 21.364, 8.63, 29.528, 14.87, 10, 28.821, 17.171,
24.872, 21.769, 15.15, 31.659, 26.025, 29.148, 10, 34.063,
12.323, 39.752, 28.288, 28.475, 17.163), yr2008 = c(14, 18.07,
25.937, 12.14, 22.263, 21.283, 8.7, 29.526, 14.648, 9.4,
28.651, 16.954, 24.385, 22.576, 15.1, 31.186, 25.489, 28.845,
10.2, 33.637, 12.123, 39.92, 27.982, 28.091, 17.298), yr2009 = c(13.9,
17.809, 25.755, 12.13, 21.929, 21.177, 8.5, 29.513, 14.498,
9, 28.429, 16.828, 24.011, 23.311, 14.53, 30.706, 25.023,
28.442, 9.9, 33.132, 11.927, 39.95, 27.588, 27.766, 17.409
), yr2010 = c(13.7, 17.499, 25.491, 11.9, 21.583, 21.034,
8.5, 29.468, 14.411, 9.4, 28.142, 16.773, 23.751, 23.892,
14.68, 30.229, 24.634, 27.944, 9.3, 32.555, 11.725, 39.8,
27.112, 27.486, 17.473), yr2011 = c(13.6, 17.146, 25.164,
11.93, 21.221, 20.841, 8.3, 29.377, 14.374, 9.4, 27.8, 16.765,
23.598, 24.252, 14, 29.764, 24.315, 27.372, 9.5, 31.918,
11.51, 39.461, 26.57, 27.236, 17.477), yr2012 = c(13.7, 16.774,
24.812, 12.1, 20.846, 20.595, 8.2, 29.235, 14.363, 9.6, 27.43,
16.783, 23.528, 24.378, 13.87, 29.318, 24.041, 26.768, 10.1,
31.25, 11.281, 38.985, 25.992, 26.993, 17.424), yr2013 = c(13.3,
16.405, 24.462, 12.08, 20.463, 20.297, 8.2, 29.044, 14.358,
8.6, 27.051, 16.805, 23.511, 24.275, 13.2, 28.899, 23.79,
26.172, 9.3, 30.578, 11.041, 38.419, 25.409, 26.739, 17.318
), yr2014 = c(12.9, 16.043, 24.119, 12.4, 20.075, 19.955,
8, 28.8, 14.349, 8.6, 26.666, 16.811, 23.531, 23.949, 12.68,
28.51, 23.552, 25.608, 9.8, 29.921, 10.79, 37.783, 24.846,
26.466, 17.157)), .Names = c("series_code", "country_name",
"country_code", "yr2001", "yr2002", "yr2003", "yr2004", "yr2005",
"yr2006", "yr2007", "yr2008", "yr2009", "yr2010", "yr2011", "yr2012",
"yr2013", "yr2014"), row.names = c(30L, 31L, 32L, 33L, 34L, 35L,
36L, 37L, 38L, 39L, 40L, 41L, 43L, 44L, 46L, 48L, 49L, 50L, 51L,
52L, 53L, 54L, 55L, 57L, 58L), class = "data.frame", na.action =
structure(c(13L, 16L, 18L, 27L), .Names = c("42", "45", "47", "56"), class = "omit"))

You could try plotting the x-axis labels horizontally.
Try this out:
plot(birthRate$country_code, birthRate$yr2014, main = "Birth Rate by Countries 2014", las=2)
Edit:
I was able to make this barplot using ggplot2.
Here's my code:
birthRate <- arrange(birthRate, yr2014)
p <- ggplot(birthRate, aes(y=yr2014, x=reorder(country_code, yr2014), fill=country_code)) +
geom_col()
p
Note: Your dput() output has 29 observations for the country_code variable, but only 25 for the yr2014 variable. I didn't know exactly where the missing data was, so I just removed the last four observations from the country_code variable to get things to line up. Your output may look slightly different based on where the NAs are...
I hope this was helpful!

Related

Can I change the color of the text loaded from ilab=cbind in Forest Plot in metafor?

Please, find my data q below.
I have produced this Forest Plot, and I would like the encircled text to be red instead of black. Can this be done?
My script
q <- escalc(measure="IRR", x1i=x1i, t1i=t1i, x2i=x2i, t2i=t2i, data=q)
q1 <- rma(yi, vi, data=q, slab=paste(study, sep=", "), method = "REML")
## Forest
forest(q1, xlim=c(-27,8), atransf=exp, showweights = FALSE, psize = 1.6, refline=log(1),
cex=0.5, ylim=c(0.1, 17), font=1, col="white", border="white", order=order(q$order),
ilab=cbind(q$x1i, q$t1i, q$ir1, q$x2i, q$t2i,q$ir2),
ilab.xpos=c(-19.3,-17,-15,-12.3,-10,-8),
rows=c(2:7,11:13),xlab="Rate ratios", mlab="")
# Headlines
text(c(-19,-16.8,-15,-12,-9.8,-8) ,15.7,font=1, cex=0.5, c("Events\n per total\n", "Person-\nyrs\n", "IR\n", "Events\n per total\n", "Person-\nyrs\n","IR\n"))
text(c(-18.75,-18.75,-18.65) ,c(13,12,11),font=1, cex=0.54, c("/ 32", "/ 32", " / 23"))
text(c(-18.75,-18.75,-18.75) ,c(7,6,5),font=1, cex=0.54, c("/ 37", "/ 37", "/ 37"))
text(c(-18.65,-18.65,-18.65) ,c(4,3,2),font=1, cex=0.54, c(" / 29", " / 29", " / 19"))
text(c(-11.65,-11.65,-11.65) ,c(13,12,11),font=1, cex=0.54, c(" /23", " /16", " /16"))
text(c(-11.65,-11.65,-11.75) ,c(7,6,5),font=1, cex=0.54, c(" /29", "/19", " /25"))
text(c(-11.65,-11.75,-11.75) ,c(4,3,2),font=1, cex=0.54, c("/19", " / 25", " / 25"))
text(8 ,15.7,font=1, "Rate ratio [95% CI]", pos=2, cex=0.5)
text(-27 ,c(14,8),font=2, c("Progression rates","Mortality rates"), pos=4, cex=0.5)
text(-27 ,c(1,10),font=1, c("\nCohort: 110 patients included","\nCohort: 76 patients included"), pos=4, cex=0.45)
My data q
q <- structure(list(study = structure(c(2L, 4L, 7L, 3L, 5L, 1L, 8L,
6L, 9L), .Label = c("WHO-I versus Unknown ", "WHO-I versus WHO-II",
"WHO-I versus WHO-II ", "WHO-I versus WHO-III", "WHO-I versus WHO-III ",
"WHO-II versus Unknown", "WHO-II versus WHO-III", "WHO-II versus WHO-III ",
"WHO-III versus Unknown"), class = "factor"), order = 9:1, x1i = c(4L,
4L, 15L, 9L, 9L, 9L, 15L, 15L, 12L), n1i = c(32L, 32L, 23L, 37L,
37L, 37L, 29L, 29L, 19L), t1i = c(74.7, 74.7, 22.8, 108.1, 108.1,
108.1, 48.3, 48.3, 27.9), x2i = c(15L, 15L, 15L, 15L, 12L, 9L,
12L, 9L, 9L), n2i = c(23L, 16L, 16L, 29L, 19L, 25L, 19L, 25L,
25L), t2i = c(22.8, 4.4, 4.4, 48.3, 27.9, 79.1, 27.9, 79.1, 79.1
), ir1 = c(5.4, 5.4, 65.7, 8.3, 8.3, 8.3, 31.1, 31.1, 43.1),
ir2 = c(65.7, 339.6, 339.6, 31.1, 43.1, 11.4, 43.1, 11.4,
11.4)), class = "data.frame", row.names = c(NA, -9L))
Changing the color of what's added via ilab isn't possible, but you can always just add the text yourself using text() (e.g., on top of the existing text). This will do it:
text(-15, rev(c(2:7,11:13)), q$ir1, col="red", font=1, cex=0.5)
text( -8, rev(c(2:7,11:13)), q$ir2, col="red", font=1, cex=0.5)

Plotting a multiple linear regression in R using scatter3D() (package plot3D)

I have the following data in a csv file.
y,x1,x2,x3,x4,x5,x6,x7,x8,x9
10,2113,1985,38.9,64.7,4,868,59.7,2205,1917
11,2003,2855,38.8,61.3,3,615,55,2096,1575
11,2957,1737,40.1,60,14,914,65.6,1847,2175
13,2285,2905,41.6,45.3,-4,957,61.4,1903,2476
10,2971,1666,39.2,53.8,15,836,66.1,1457,1866
11,2309,2927,39.7,74.1,8,786,61,1848,2339
10,2528,2341,38.1,65.4,12,754,66.1,1564,2092
11,2147,2737,37,78.3,-1,761,58,1821,1909
4,1689,1414,42.1,47.6,-3,714,57,2577,2001
2,2566,1838,42.3,54.2,-1,797,58.9,2476,2254
7,2363,1480,37.3,48,19,984,67.5,1984,2217
example = data.frame(x1,x2,x3,x4,y)
How can I graph the variables x1, x2, x3 using scatter3D(x,y,z)?
I have tried:
library("plot3D")
with(example,scatter3D(y ~ x1 + x2 + x3))
But I get error:
Error in min(x,na.rm) : invalid 'type' (list) of argument
Looks like you want to plot a regression plane. The scatter3d function in package car will do that. You need to install car and rgl. First let's make your data more accessible:
dput(example)
structure(list(y = c(10L, 11L, 11L, 13L, 10L, 11L, 10L, 11L,
4L, 2L, 7L), x1 = c(2113L, 2003L, 2957L, 2285L, 2971L, 2309L,
2528L, 2147L, 1689L, 2566L, 2363L), x2 = c(1985L, 2855L, 1737L,
2905L, 1666L, 2927L, 2341L, 2737L, 1414L, 1838L, 1480L), x3 = c(38.9,
38.8, 40.1, 41.6, 39.2, 39.7, 38.1, 37, 42.1, 42.3, 37.3), x4 = c(64.7,
61.3, 60, 45.3, 53.8, 74.1, 65.4, 78.3, 47.6, 54.2, 48), x5 = c(4L,
3L, 14L, -4L, 15L, 8L, 12L, -1L, -3L, -1L, 19L), x6 = c(868L,
615L, 914L, 957L, 836L, 786L, 754L, 761L, 714L, 797L, 984L),
x7 = c(59.7, 55, 65.6, 61.4, 66.1, 61, 66.1, 58, 57, 58.9,
67.5), x8 = c(2205L, 2096L, 1847L, 1903L, 1457L, 1848L, 1564L,
1821L, 2577L, 2476L, 1984L), x9 = c(1917L, 1575L, 2175L,
2476L, 1866L, 2339L, 2092L, 1909L, 2001L, 2254L, 2217L)),
class = "data.frame", row.names = c(NA, -11L))
install.packages("car")
install.packages("rgl")
library(car)
library(rgl)
scatter3d(y~x1+x2, example)
The plot window will be small. Use the mouse to drag the lower right corner to make it bigger. You can drag within the plot to rotate it.

Extracting values for specific lat long from netcdf

I'm trying to read into R a netCDF file. The netcdf chirps-v2.0.1981.days_p05.nc is downloaded from here:
ftp://ftp.chg.ucsb.edu/pub/org/chg/products/CHIRPS-2.0/global_daily/netcdf/p05/
This netCDF file describes daily rainfall globally as a function of longitude, latitude and has size of
1.1 GB
I also have a set of lon lat
dat <- structure(list(locatioID = paste0('ID', 1:16), lon = c(73.73, 86, 73.45, 86.41, 85.36, 81.95, 82.57, 75.66, 82.03,
81.73, 85.66, 85.31, 81.03, 81.70, 87.03, 73.38),
lat = c(24.59, 20.08, 22.61, 23.33, 23.99, 19.09, 18.85, 15.25, 26.78,
16.63, 25.98, 23.28, 24.5, 21.23, 25.08, 21.11)),
row.names = c(1L, 3L, 5L, 8L, 11L, 14L, 17L, 18L, 19L, 21L,
23L, 26L, 29L, 32L, 33L, 35L), class = "data.frame")
library(ncdf4)
library(raster)
temp <- nc_open("chirps-v2.0.1981.days_p05.nc")
precip = list()
precip$x = ncvar_get(temp, "longitude")
precip$y = ncvar_get(temp, "latitude")
precip$z = ncvar_get(temp, "precip", start=c(1, 1, 1), count=c(-1, -1, 1))
precip.r = raster(precip)
plot(precip.r)
I have two questions:
Can anyone explain to me what does start and count argument does? ?ncvar_get does not give me an intuitive feeling. If I want to create a raster of Julian day 252,
which argument do I need to change?
How do I extract the daily rainfall values for all the 365 days for every lat lon in datsuch that I have a matrix/dataframe of 16 * 365 days
You can use the following code for data extraction from .nc files
dat <- structure(list(locatioID = paste0('ID', 1:16), lon = c(73.73, 86, 73.45, 86.41, 85.36, 81.95, 82.57, 75.66, 82.03,
81.73, 85.66, 85.31, 81.03, 81.70, 87.03, 73.38),
lat = c(24.59, 20.08, 22.61, 23.33, 23.99, 19.09, 18.85, 15.25, 26.78,
16.63, 25.98, 23.28, 24.5, 21.23, 25.08, 21.11)),
row.names = c(1L, 3L, 5L, 8L, 11L, 14L, 17L, 18L, 19L, 21L,
23L, 26L, 29L, 32L, 33L, 35L), class = "data.frame")
temp <- brick("chirps-v2.0.1981.days_p05.nc")
xy <- dat[,2:3] #Column 1 is longitude and column 2 is latitude
xy
spts <- SpatialPoints(xy, proj4string=CRS("+proj=longlat +datum=WGS84"))
#Extract data by spatial point
temp2 <- extract(temp, spts)
temp3 <- t(temp2) #transpose raster object
colnames(temp3) <- dat[,1] #It would be better if you have the location names corresponding to the points
head(temp3)
write.csv(temp3, "Rainfall.csv")

crosstalk package error in datatable

When attempting to render a html app with crosstalk between leaflet/DT, I get the following error:
Error in datatable(sd, extensions = "Scroller", style = "bootstrap", class = "compact", : 'data' must be 2-dimensional (e.g. data frame or matrix)
data frame:
df2 <- data.frame(
structure(list(lat = c(-20.42, -20.62, -26, -17.97, -20.42, -19.68,
-11.7, -28.11, -28.74, -17.47, -21.44, -12.26, -18.54, -21, -20.7,
-15.94, -13.64, -17.83, -23.5, -22.63), long = c(181.62, 181.03,
184.1, 181.66, 181.96, 184.31, 166.1, 181.93, 181.74, 179.59,
180.69, 167, 182.11, 181.66, 169.92, 184.95, 165.96, 181.5, 179.78,
180.31), depth = c(562L, 650L, 42L, 626L, 649L, 195L, 82L, 194L,
211L, 622L, 583L, 249L, 554L, 600L, 139L, 306L, 50L, 590L, 570L,
598L), mag = c(4.8, 4.2, 5.4, 4.1, 4, 4, 4.8, 4.4, 4.7, 4.3,
4.4, 4.6, 4.4, 4.4, 6.1, 4.3, 6, 4.5, 4.4, 4.4), stations = c(41L,
15L, 43L, 19L, 11L, 12L, 43L, 15L, 35L, 19L, 13L, 16L, 19L, 10L,
94L, 11L, 83L, 21L, 13L, 18L)), .Names = c("lat", "long", "depth",
"mag", "stations"), row.names = c(NA, 20L), class = "data.frame")
)
Reproducible code:
library(crosstalk)
library(leaflet) #devtools::install_github('rstudio/leaflet', force = TRUE)
library(DT)
# Wrap data frame in SharedData
sd <- SharedData$new(quakes[sample(nrow(quakes), 10),])
# Create a filter input
filter_slider("mag", "Magnitude", sd, column=~mag, step=0.1, width=250)
# Use SharedData like a dataframe with Crosstalk-enabled widgets
bscols(
leaflet(sd) %>% addTiles() %>% addMarkers(),
datatable(sd, extensions="Scroller", style="bootstrap", class="compact", width="100%",
options=list(deferRender=TRUE, scrollY=300, scroller=FALSE))
)
And the platform and pkg versions:
R version 3.3.2 (2016-10-31)
crosstalk_1.0.1 (#installed from devtools/github)
leaflet_1.0.2.9010
DT_0.2
install 'DT' from devtool/github.
devtools::install_github('rstudio/DT')

Trying to calculate running team statistics in R - Calculate Avg Offensive yardage before this game as well as that of the opponent

I've been asked to make the distinct problems more clear, so here they are at the top:
How to compute the rolling average for a team, excluding the current week
How to add columns containing similar stats for the opponent team
Here's the original text:
I'm learning R to do some armchair analysis of sports. Right now, I'm stuck on a problem where I have a list of every game played in an NFL season, and I'm trying to calculate what the AvgTotalYds of offense was in the weeks leading up to this game. Eventually, I'd like to be able to do an average for the season-to-date, as well as a moving average of the past X periods.
Further complicating it is that I'd like to also get the same info for the opponent leading up to the week in question. I've searched a lot for a similar problem, but couldn't find any solutions.
Below is a sample of the data. The database I was given has some unfortunate column names. ScoreOff actually refers to the total points scored by the team in the TeamName field, whether they were offensive, defensive, or special teams plays. *Def, likewise, refer to the Opponent. Code examples are using a data frame labeled "df2."
dput(head(df2))
structure(list(Date = structure(c(14126, 14126, 14129, 14129,
14129, 14129), class = "Date"), TeamName = structure(c(21L, 32L,
1L, 2L, 3L, 4L), .Label = c("Arizona Cardinals", "Atlanta Falcons",
"Baltimore Ravens", "Buffalo Bills", "Carolina Panthers", "Chicago Bears",
"Cincinnati Bengals", "Cleveland Browns", "Dallas Cowboys", "Denver Broncos",
"Detroit Lions", "Green Bay Packers", "Houston Texans", "Indianapolis Colts",
"Jacksonville Jaguars", "Kansas City Chiefs", "Miami Dolphins",
"Minnesota Vikings", "New England Patriots", "New Orleans Saints",
"New York Giants", "New York Jets", "Oakland Raiders", "Philadelphia Eagles",
"Pittsburgh Steelers", "San Diego Chargers", "San Francisco 49ers",
"Seattle Seahawks", "St Louis Rams", "Tampa Bay Buccaneers",
"Tennessee Titans", "Washington Redskins"), class = "factor"),
ScoreOff = c(16L, 7L, 23L, 34L, 17L, 34L), FirstDownOff = c(21L,
11L, 18L, 23L, 21L, 13L), ThirdDownPctOff = structure(c(34L,
14L, 20L, 21L, 35L, 16L), .Label = c("0%", "10%", "11%",
"12%", "13%", "14%", "15%", "17%", "18%", "19%", "20%", "21%",
"22%", "23%", "24%", "25%", "27%", "29%", "30%", "31%", "33%",
"35%", "36%", "37%", "38%", "40%", "41%", "42%", "43%", "44%",
"45%", "46%", "47%", "50%", "53%", "54%", "55%", "56%", "57%",
"58%", "59%", "60%", "61%", "62%", "63%", "64%", "65%", "67%",
"69%", "73%", "77%", "8%", "80%", "9%", "92%"), class = "factor"),
RushAttOff = c(32L, 24L, 39L, 42L, 46L, 29L), RushYdsOff = c(154L,
84L, 109L, 318L, 229L, 106L), PassAttOff = c(35L, 27L, 30L,
13L, 29L, 31L), PassCompOff = c(19L, 15L, 19L, 9L, 15L, 20L
), PassYdsOff = c(216L, 133L, 197L, 161L, 129L, 234L), PassIntOff = c(1L,
0L, 0L, 0L, 0L, 0L), FumblesOff = c(0L, 0L, 0L, 0L, 2L, 0L
), SackYdsOff = c(16L, 8L, 21L, 5L, 0L, 2L), PenYdsOff = c(70L,
35L, 40L, 68L, 64L, 14L), TimePossOff = structure(c(348L,
52L, 368L, 175L, 354L, 239L), .Label = c("14:45", "18:15",
"18:27", "19:31", "19:56", "20:11", "20:12", "20:26", "20:48",
"21:03", "21:08", "21:16", "21:26", "21:28", "21:35", "21:44",
"21:45", "21:52", "21:54", "22:03", "22:08", "22:12", "22:16",
"22:25", "22:30", "22:31", "22:33", "22:34", "22:38", "22:39",
"22:53", "22:55", "22:59", "23:09", "23:10", "23:12", "23:15",
"23:23", "23:28", "23:30", "23:33", "23:37", "23:38", "23:42",
"23:43", "23:45", "23:48", "23:49", "23:56", "24:06", "24:13",
"24:17", "24:18", "24:21", "24:33", "24:34", "24:35", "24:41",
"24:43", "24:49", "24:50", "24:54", "24:58", "24:59", "25:01",
"25:02", "25:05", "25:11", "25:14", "25:16", "25:19", "25:25",
"25:29", "25:31", "25:32", "25:34", "25:36", "25:37", "25:38",
"25:40", "25:41", "25:46", "25:47", "25:53", "25:55", "25:57",
"25:58", "26:00", "26:04", "26:09", "26:10", "26:11", "26:12",
"26:13", "26:16", "26:20", "26:27", "26:32", "26:36", "26:37",
"26:38", "26:39", "26:40", "26:41", "26:44", "26:46", "26:49",
"26:53", "26:56", "26:59", "27:01", "27:04", "27:10", "27:12",
"27:13", "27:15", "27:18", "27:20", "27:24", "27:25", "27:26",
"27:27", "27:28", "27:30", "27:32", "27:37", "27:40", "27:44",
"27:46", "27:47", "27:48", "27:50", "27:51", "27:52", "27:53",
"27:55", "27:57", "27:58", "27:59", "28:00", "28:01", "28:03",
"28:05", "28:06", "28:07", "28:13", "28:14", "28:16", "28:17",
"28:18", "28:19", "28:21", "28:22", "28:24", "28:25", "28:28",
"28:29", "28:32", "28:38", "28:40", "28:41", "28:45", "28:47",
"28:49", "28:51", "28:53", "28:55", "28:57", "28:58", "28:59",
"29:00", "29:02", "29:04", "29:05", "29:07", "29:08", "29:11",
"29:13", "29:14", "29:18", "29:19", "29:20", "29:26", "29:27",
"29:29", "29:31", "29:32", "29:33", "29:34", "29:36", "29:37",
"29:38", "29:41", "29:42", "29:43", "29:49", "29:50", "29:55",
"29:56", "29:59", "30:01", "30:04", "30:05", "30:10", "30:11",
"30:17", "30:18", "30:19", "30:22", "30:23", "30:24", "30:26",
"30:27", "30:28", "30:29", "30:31", "30:33", "30:34", "30:40",
"30:41", "30:42", "30:46", "30:47", "30:49", "30:52", "30:53",
"30:55", "30:58", "31:00", "31:01", "31:02", "31:03", "31:05",
"31:07", "31:09", "31:11", "31:13", "31:15", "31:19", "31:20",
"31:22", "31:28", "31:31", "31:32", "31:35", "31:36", "31:38",
"31:39", "31:41", "31:42", "31:43", "31:44", "31:46", "31:47",
"31:53", "31:54", "31:55", "31:57", "31:59", "32:00", "32:01",
"32:02", "32:03", "32:05", "32:07", "32:08", "32:09", "32:10",
"32:12", "32:13", "32:14", "32:16", "32:20", "32:23", "32:28",
"32:30", "32:32", "32:33", "32:34", "32:35", "32:36", "32:40",
"32:42", "32:45", "32:47", "32:48", "32:50", "32:56", "32:59",
"33:01", "33:04", "33:07", "33:11", "33:14", "33:16", "33:19",
"33:20", "33:21", "33:22", "33:23", "33:24", "33:28", "33:33",
"33:40", "33:44", "33:47", "33:48", "33:49", "33:50", "33:51",
"33:56", "34:00", "34:02", "34:03", "34:05", "34:07", "34:13",
"34:14", "34:19", "34:20", "34:22", "34:23", "34:24", "34:26",
"34:28", "34:29", "34:31", "34:35", "34:41", "34:44", "34:46",
"34:49", "34:55", "34:58", "34:59", "35:01", "35:02", "35:06",
"35:10", "35:11", "35:17", "35:19", "35:25", "35:26", "35:27",
"35:39", "35:42", "35:43", "35:47", "35:54", "36:04", "36:11",
"36:12", "36:15", "36:17", "36:18", "36:22", "36:23", "36:27",
"36:30", "36:32", "36:37", "36:45", "36:48", "36:50", "36:51",
"37:01", "37:05", "37:07", "37:21", "37:22", "37:26", "37:27",
"37:29", "37:30", "37:35", "37:42", "37:44", "37:48", "37:52",
"37:57", "38:08", "38:15", "38:16", "38:23", "38:25", "38:32",
"38:34", "38:44", "38:52", "38:57", "39:12", "39:34", "39:48",
"39:49", "40:04", "40:29", "41:33", "41:45", "45:15"), class = "factor"),
PuntAvgOff = c(36.3, 37.9, 45, 38.3, 48.2, 46.6), Opponent = structure(c(32L,
21L, 27L, 11L, 7L, 28L), .Label = c("Arizona Cardinals",
"Atlanta Falcons", "Baltimore Ravens", "Buffalo Bills", "Carolina Panthers",
"Chicago Bears", "Cincinnati Bengals", "Cleveland Browns",
"Dallas Cowboys", "Denver Broncos", "Detroit Lions", "Green Bay Packers",
"Houston Texans", "Indianapolis Colts", "Jacksonville Jaguars",
"Kansas City Chiefs", "Miami Dolphins", "Minnesota Vikings",
"New England Patriots", "New Orleans Saints", "New York Giants",
"New York Jets", "Oakland Raiders", "Philadelphia Eagles",
"Pittsburgh Steelers", "San Diego Chargers", "San Francisco 49ers",
"Seattle Seahawks", "St Louis Rams", "Tampa Bay Buccaneers",
"Tennessee Titans", "Washington Redskins"), class = "factor"),
ScoreDef = c(7L, 16L, 13L, 21L, 10L, 10L), FirstDownDef = c(11L,
21L, 13L, 21L, 8L, 16L), ThirdDownPctDef = structure(c(14L,
34L, 25L, 13L, 7L, 10L), .Label = c("0%", "10%", "11%", "12%",
"13%", "14%", "15%", "17%", "18%", "19%", "20%", "21%", "22%",
"23%", "24%", "25%", "27%", "29%", "30%", "31%", "33%", "35%",
"36%", "37%", "38%", "40%", "41%", "42%", "43%", "44%", "45%",
"46%", "47%", "50%", "53%", "54%", "55%", "56%", "57%", "58%",
"59%", "60%", "61%", "62%", "63%", "64%", "65%", "67%", "69%",
"73%", "77%", "8%", "80%", "9%", "92%"), class = "factor"),
RushAttDef = c(24L, 32L, 20L, 21L, 23L, 21L), RushYdsDef = c(84L,
154L, 108L, 62L, 65L, 85L), PassAttDef = c(27L, 35L, 20L,
33L, 25L, 41L), PassCompDef = c(15L, 19L, 14L, 24L, 10L,
17L), PassYdsDef = c(133L, 216L, 195L, 262L, 99L, 190L),
PassIntDef = c(0L, 1L, 1L, 1L, 1L, 1L), FumblesDef = c(0L,
0L, 4L, 0L, 1L, 1L), SackYdsDef = c(8L, 16L, 12L, 16L, 10L,
23L), PenYdsDef = c(35L, 70L, 20L, 30L, 40L, 30L), TimePossDef = structure(c(52L,
348L, 32L, 225L, 46L, 161L), .Label = c("14:45", "18:15",
"18:27", "19:31", "19:56", "20:11", "20:12", "20:26", "20:48",
"21:03", "21:08", "21:16", "21:26", "21:28", "21:35", "21:44",
"21:45", "21:52", "21:54", "22:03", "22:08", "22:12", "22:16",
"22:25", "22:30", "22:31", "22:33", "22:34", "22:38", "22:39",
"22:53", "22:55", "22:59", "23:09", "23:10", "23:12", "23:15",
"23:23", "23:28", "23:30", "23:33", "23:37", "23:38", "23:42",
"23:43", "23:45", "23:48", "23:49", "23:56", "24:06", "24:13",
"24:17", "24:18", "24:21", "24:33", "24:34", "24:35", "24:41",
"24:43", "24:49", "24:50", "24:54", "24:58", "24:59", "25:01",
"25:02", "25:05", "25:11", "25:14", "25:16", "25:19", "25:25",
"25:29", "25:31", "25:32", "25:34", "25:36", "25:37", "25:38",
"25:40", "25:41", "25:46", "25:47", "25:53", "25:55", "25:57",
"25:58", "26:00", "26:04", "26:09", "26:10", "26:11", "26:12",
"26:13", "26:16", "26:20", "26:27", "26:32", "26:36", "26:37",
"26:38", "26:39", "26:40", "26:41", "26:44", "26:46", "26:49",
"26:53", "26:56", "26:59", "27:01", "27:04", "27:10", "27:12",
"27:13", "27:15", "27:18", "27:20", "27:24", "27:25", "27:26",
"27:27", "27:28", "27:30", "27:32", "27:37", "27:40", "27:44",
"27:46", "27:47", "27:48", "27:50", "27:51", "27:52", "27:53",
"27:55", "27:57", "27:58", "27:59", "28:00", "28:01", "28:03",
"28:05", "28:06", "28:07", "28:13", "28:14", "28:16", "28:17",
"28:18", "28:19", "28:21", "28:22", "28:24", "28:25", "28:28",
"28:29", "28:32", "28:38", "28:40", "28:41", "28:45", "28:47",
"28:49", "28:51", "28:53", "28:55", "28:57", "28:58", "28:59",
"29:00", "29:02", "29:05", "29:07", "29:08", "29:11", "29:13",
"29:14", "29:18", "29:19", "29:20", "29:26", "29:27", "29:29",
"29:31", "29:32", "29:33", "29:34", "29:36", "29:37", "29:38",
"29:41", "29:42", "29:43", "29:49", "29:50", "29:55", "29:56",
"29:59", "30:01", "30:04", "30:05", "30:10", "30:11", "30:17",
"30:18", "30:19", "30:22", "30:23", "30:24", "30:26", "30:27",
"30:28", "30:29", "30:31", "30:33", "30:34", "30:40", "30:41",
"30:42", "30:46", "30:47", "30:49", "30:52", "30:53", "30:55",
"30:56", "30:58", "31:00", "31:01", "31:02", "31:03", "31:05",
"31:07", "31:09", "31:11", "31:13", "31:15", "31:19", "31:20",
"31:22", "31:28", "31:31", "31:32", "31:35", "31:36", "31:38",
"31:39", "31:41", "31:42", "31:43", "31:44", "31:46", "31:47",
"31:53", "31:54", "31:55", "31:57", "31:59", "32:00", "32:01",
"32:02", "32:03", "32:05", "32:07", "32:08", "32:09", "32:10",
"32:12", "32:13", "32:14", "32:16", "32:20", "32:23", "32:28",
"32:30", "32:32", "32:33", "32:34", "32:35", "32:36", "32:40",
"32:42", "32:45", "32:47", "32:48", "32:50", "32:56", "32:59",
"33:01", "33:04", "33:07", "33:11", "33:14", "33:16", "33:19",
"33:20", "33:21", "33:22", "33:23", "33:24", "33:28", "33:33",
"33:40", "33:44", "33:47", "33:48", "33:49", "33:50", "33:51",
"33:56", "34:00", "34:02", "34:03", "34:05", "34:07", "34:13",
"34:14", "34:19", "34:20", "34:22", "34:23", "34:24", "34:26",
"34:28", "34:29", "34:31", "34:35", "34:41", "34:44", "34:46",
"34:49", "34:55", "34:58", "34:59", "35:01", "35:02", "35:06",
"35:10", "35:11", "35:17", "35:19", "35:25", "35:26", "35:27",
"35:39", "35:42", "35:43", "35:47", "35:54", "36:04", "36:11",
"36:12", "36:15", "36:17", "36:18", "36:22", "36:23", "36:27",
"36:30", "36:32", "36:37", "36:45", "36:48", "36:50", "36:51",
"37:01", "37:05", "37:07", "37:21", "37:22", "37:26", "37:27",
"37:29", "37:30", "37:35", "37:42", "37:44", "37:48", "37:52",
"37:57", "38:08", "38:15", "38:16", "38:23", "38:25", "38:32",
"38:34", "38:44", "38:52", "38:57", "39:12", "39:34", "39:48",
"39:49", "40:04", "40:29", "41:33", "41:45", "45:15"), class = "factor"),
Site = structure(c(1L, 3L, 3L, 1L, 1L, 1L), .Label = c("H",
"N", "V"), class = "factor"), Line = c(4.5, -4.5, 2.5, -3,
-2, 1), Totalline = c(41.5, 41.5, 42, 41, 37.5, 38.5), TotalYdsOff = c(370L,
217L, 306L, 479L, 358L, 340L), TotalYdsDef = c(217L, 370L,
303L, 324L, 164L, 275L), ActualLine = c(-9L, 9L, -10L, -13L,
-7L, -24L)), .Names = c("Date", "TeamName", "ScoreOff", "FirstDownOff",
"ThirdDownPctOff", "RushAttOff", "RushYdsOff", "PassAttOff",
"PassCompOff", "PassYdsOff", "PassIntOff", "FumblesOff", "SackYdsOff",
"PenYdsOff", "TimePossOff", "PuntAvgOff", "Opponent", "ScoreDef",
"FirstDownDef", "ThirdDownPctDef", "RushAttDef", "RushYdsDef",
"PassAttDef", "PassCompDef", "PassYdsDef", "PassIntDef", "FumblesDef",
"SackYdsDef", "PenYdsDef", "TimePossDef", "Site", "Line", "Totalline",
"TotalYdsOff", "TotalYdsDef", "ActualLine"), row.names = c(NA,
6L), class = "data.frame")
I added the TotalYds[Off|Def] columns as that was trivial to do. The closest thing to the properly calculating a moving average was accomplished with the zoo and plyr libraries, and the following command:
ddply(df2, .(TeamName), summarise, rollmean(TotalYdsOff, k=4, fill=0, align="right"))
Which almost does what I want, except that it will use the information for the current week in the average.
As far as getting the matching information for the opponent, I was thinking there'd be a way to pull out the same data from the row where "TeamName" and "Date" both match to the current row's "Opponent" and "Date." This is because the database has two entries on a given game, one for the home team and one for the away (and *Off and *Def are swapped). Look at lines 1 and 2 in the example data, specifically Date, TeamName, and Opponent and you'll understand what I'm trying to say.
Any guidance here? I imagine this is relatively trivial for someone with more than a few days tinkering in R, who would know of some function or library that does this. I, however, am only a few days in, and thus am having some trouble.
A simple way to address question 1 would be to use the ddply call you described, but to pass it a data frame with all of this week's games removed:
require(plyr)
dfRedacted <- ddply(df2, .(TeamName), function(x) subset(x, Date!=max(Date)))
meanStats <- ddply(dfRedacted, .(TeamName), summarise, rollmean(TotalYdsOff, k=4, fill=0, align="right"))
For now, I ended up creating a function to calculate the season average up to (but not including) a given game and putting the results in a separate vector, then just using cbind() to add it to the data frame:
foo <- vector()
for(each in levels(df$TeamName)) {
foo <- c(foo, calc_avg_yds(df, each))
}
df <- cbind(df[order(df$TeamName), ], AvgTotalYdsOff = foo)
As you can see, i reordered the df by teamname (secondary would be date, which it was already ordered by) to make sure they match up.
To get the info from the corresponding row (the one for the other team in the game), I did a loop and put everything in a vector, then another cbind():
for(i in nrow(df)) {
foo <- c(foo, subset(df, TeamName==df[i,]$Opponent & Date==df[i,]$Date)$AvgTotalYdsOff)
}
df <- cbind(df, AvgTotalYdsDef = foo)
In the end, I went with the simple, cruder route as I didn't know of better alternative. Hope this helps someone in the future with a similar problem.

Resources