R - Calculating 12 month moving average on panel data - r

First, full disclosure. I attempted to do this strictly in MS Access with correlated subqueries, and had some help on this post 12 month moving average by person, date. I originally thought my data would be small enough to chug through, but it is awful. As an alternative, I'm going to try running this in R and then writing results to a new table in MS Access. I have data such that I have the following fields:
rep, cyc_date, amt
Following the linked example by Andrie for a rolling 5-year period (as opposed to the 5-year average) R: Calculating 5 year averages in panel data, I am trying to get rolling 12 month average for amt field by rep. Here is my code:
library(zoo)
library(plyr)
library(RODBC)
# Pull data from local MS Access database. The referenced sqlFetch is a query
# that pulls the data, ordered by `rep`, then `cyc_date`
channel <- odbcConnectAccess2007("C://MyDB.accdb")
data <- data.frame(sqlFetch(channel, "MyView"))
# Ensure coercion of `cyc_date` to date type
data$cyc_date <- as.Date(data$cyc_date)
# Function (take from post above)
rollmean12 <- function(x) {
rollmean(x, 12)
}
# Calculate rolling average by person
rollvec <- ddply(data, .(data$rep), rollmean12(data$amt))
Unfortunately, this doesn't work. I'm getting the following error:
Error in llply(.data = .data, .fun = .fun, ..., .progress = .progress, :
.fun is not a function.
I'm not sure why this is happening. Do I need to explicitly convert data to a zoo object? If so, not sure how to handle the extra dimensionality resulting from the person_id field. Any help would be very much appreciated.

I found this code on the following post: applying rolling mean by group in R
data$movavg <- ave(data$amt, data$rep, FUN = function(x) rollmean(x, k=12, align="right", na.pad=T)).
ave saves the day!

Just some hints, as I don't work at all with time series: ddply requires a data frame input, so don't convert it to a zoo object. .(data$rep) I think should be just .(rep), and rollmean12 should not be called with arguments. Rather, you should re-write the function to extract the columns you want. So, approximately something like this:
rollmean12 <- function(x) rollmean(x$amt, 12)
If you do ?ddply there is a link to a very helpful publication in JSS.

Try the tidyquant library
x %>% tq_mutate(
# tq_mutate args
select = amt,
mutate_fun = rollapply,
col_rename = "rollmean12", ####
# rollapply args
width = 12,
align = "right",
FUN = mean,
# mean args
na.rm = TRUE
)

Related

I am trying to Calculate for each year the standard deviation and average return in R

I started learning R three days ago so pls bear with me.... if you see any flaws in my code or calculations please call them out.
I have tried this, but get a error message every time:
table.AnnualizedReturns(Apple.Monthly.Returns[, 2:3, drop = FALSE], scale = 12,
Rf = 0, geometric = TRUE, digits = 4)
Error in checkData(R) :
The data cannot be converted into a time series. If you are trying to pass in names from a data object with one column, you should use the form 'data[rows, columns, drop = FALSE]'. Rownames should have standard date formats, such as '1985-03-15'.
As you can clearly see I have no clue what I am doing.
This is every line of code I have written this far:
Dates <- Data_Task2$`Names Date`[1801:2270]
as.numeric(Dates)
Dates <- ymd(Dates)
Monthly.Return <- Data_Task2$Returns[1801:2270]
Monthly.Return <- as.numeric(Monthly.Return)
Apple.Monthly.Returns <- data.frame(Dates, Monthly.Return)
Log.return = log(Monthly.Return + 1)
Apple.Monthly.Returns$Log.return = log(Apple.Monthly.Returns$Monthly.Return + 1)
You should check out the Tidyverse and specifically dplyr (https://dplyr.tidyverse.org/).
This gets you to a good starting point:
https://www.r-bloggers.com/2014/03/using-r-quickly-calculating-summary-statistics-with-dplyr/

How to create a function to retrieve multiple cities historical weather using R and DARKSKY api?

I'm trying to retrieve historical weather data for 100 cities in R using DARKSKY API.
The following code works to get historical data for 1 city, however I'm having issues creating a loop function to go through a list of 100 latitude and longitudes and spit out the data.
weather <- function(Long,Lat)
{ a <-seq(Sys.Date()-10, Sys.Date(), "1 day") %>%
map(~get_forecast_for(Long,Lat,.x, units = 'si')) %>%
map_df('daily')
write.csv(a,"blah blah")
}
weather(52.6983,-1.0735)
My initial thought was to upload csv file with all the longitude and latitudes I require. Set them as variables and then map them to the function above.
data <- read.csv("blah blah")
Long <- data$Longitude
Lat <- data$Latitude
map(c("Long","Lat"),weather)
But it keeps bringing back error messages.
Can anyone help please?
Thank you
You are almost there. There are a couple of things needed to iterate the get_forecast_for function by rows. From the purrr package, the pmap function is good for repeating a function by row whereas the imap function can be used for repeating a function by cells in a row.
Using this approach, I wrote two functions: weather_at_coords and weather. weather_at_coords is used to send a request to DarkSkyAPI for weather at specific location in a given time range (i.e., last ten days). The weather function is used to repeat the function by row.
I saw that you wanted the nested object daily, so wrote the function to extract that list from the response. I'm assuming that you also wanted the results in a data.frame so I added bind_rows. I added a column id so that rows can be properly linked to a location (or you can add any columns that you like).
# pkgs
library(tidyverse)
library(darksky)
# set API Key: free from https://darksky.net/dev
darksky::darksky_api_key()
# Forecast at a given point and time period
weather_at_coords <- function(...) {
d <- rlang::list2(...)
time <- seq(Sys.Date()-10, Sys.Date(), "1 day")
response <- imap(time, ~ darksky::get_forecast_for(d$lat, d$lon, .x, units = "si")[["daily"]])
out <- bind_rows(response) %>% mutate(id = d$id)
return(out)
}
# primary function (iterates across rows)
weather <- function(data) {
result <- pmap(data, ~ weather_at_coords(...))
return(bind_rows(result))
}
# sample data
d <- data.frame(
id = c("a", "b"),
lat = c(37.8267,34.8267),
lon = c(-122.423, -120.423)
)
# run
x <- weather(d)
x
Notes
Make sure you have the rlang package installed
Adjust the lat and lon variable names as required.

Merge second columns in different xts objects contained in a list

I am new to R programming and I was trying to work with financial data. Currently I have built multiple xts objects which contain Close, High, Open and Low prices on a daily basis, and I have stored the multiple xts objects in a list (called Data) so that I can access them easily.
Now I want to create a larger xts object which contain only the "Close" prices for all the xts objects (which may have different dates in which case an NA will be the output) in the list
I have tried using apply functions but of no avail.
sapply(Data,function(x) merge(x,"[[", [,"CLOSE"]))
The code does not compile and throws an error, but i thought it should work something like this. Will really appreciate any help. Thank you
You can use merge within Reduce:
# Sample data
set.seed(2018);
dates <- seq(as.Date("2018/01/01"), by = "day", length.out = 5);
Data <- lapply(1:5, function(x)
xts(x = data.frame(Close = rnorm(5), High = rnorm(5)), order.by = dates))
# Merge
Reduce(function(x, y) merge(x, y, all = TRUE), lapply(Data, function(x) x[, "Close"]));
# Close Close.1 Close.2 Close.3 Close.4
#2018-01-01 -0.42298398 -0.6430347 1.2638637 -0.2284119 -0.3594423
#2018-01-02 -1.54987816 -1.0300287 0.2501979 1.1786797 -1.2995363
#2018-01-03 -0.06442932 0.7124813 0.2581954 -0.2662727 -0.8698701
#2018-01-04 0.27088135 -0.4457721 1.7855342 0.5281408 1.0543623
#2018-01-05 1.73528367 0.2489796 -1.2197058 -1.7686592 -0.1486396

Using a function in R to scrape website, returning "subscript out of bounds" error

I am trying to scrape player data from the Baseball Reference website, using a function to loop through multiple years (variable "year") for each player notated by "playerid."
library(plyr)
library(XML)
fetch_stats <- function(playerid, year) {
url <- paste0("http://www.baseball-reference.com/players/gl.cgi?id=",playerid,"&t=b&year=",year)
data <- readHTMLTable(url, stringsAsFactors = FALSE)
data <- data[[3]]
data$Year <- year
data$PlayerId <- playerid
data
}
This function works perfectly well when it is applied to a single year's worth of data, as seen here:
AdrianGonzales <- ldply("gonzaad01", fetch_stats, year= 2008, .progress="text")
However, as soon as I actually use the function to loop through the multiple years in a players career, it always spits out the following error:
AdrianGonzales <- ldply("gonzaad01", fetch_stats, year= 2009:2004, .progress="text")
Error in data[[3]] : subscript out of bounds
In addition: Warning message:
XML content does not seem to be XML: 'http://www.baseball- reference.com/players/gl.cgi?id=gonzaad01&t=b&year=2009
http://www.baseball-reference.com/players/gl.cgi?id=gonzaad01&t=b&year=2008
http://www.baseball-reference.com/players/gl.cgi?id=gonzaad01&t=b&year=2007
http://www.baseball-reference.com/players/gl.cgi?id=gonzaad01&t=b&year=2006
http://www.baseball-reference.com/players/gl.cgi?id=gonzaad01&t=b&year=2005
http://www.baseball-reference.com/players/gl.cgi?id=gonzaad01&t=b&year=2004'
From what I have been able to find, the "subscript out of bounds" error happens when you exceed the limits of a defined dataset within R. For this particular function, I may just be dumb, but I don't see how that would apply in this case- or why it would work for a single year, but not for several at a time.
I'm open to any and all suggestions. Thanks ahead of time.
You could just use lapply as in the following way below. I put in a minor fix to fetch_stats as it seems that the 6th column returned has no name. You can do what you like with it, as it is just to show how you can use lapply instead.
library(plyr)
library(XML)
# Minor change made to get function working (naming column 6)
fetch_stats <- function(playerid, year) {
url <- paste0("http://www.baseball-reference.com/players/gl.cgi?id=",playerid,"&t=b&year=",year)
data <- readHTMLTable(url, stringsAsFactors = FALSE)
data <- data[[3]]
data$Year <- year
data$PlayerId <- played
### Column six name is empty.
names(data)[6] <- 'EMPTY'
data
}
res <- lapply(2009:2004, function(x) fetch_stats("gonzaad01", x))
resdf <- ldply(res)
This will create a list of 6 elements, one for each year, then convert the list to a data.frame
The way ldapply is applied in your code, it is not giving it one year at a time, it is giving the entire vector of years all at once.
EDIT
After looking a little closer, here is a solution using ldply
new_res <- ldply(.data = 2009:2004,
.fun = function(x) fetch_stats("gonzaad01", x),
.progress="text")
This gave me the same results as the other method above.

Moving window over zoo time series in R

I'm running into issues while applying a moving window function to a time series dataset. I've imported daily streamflow data (date and value) into a zoo object, as approximated by the following:
library(zoo)
df <- data.frame(sf = c("2001-04-01", "2001-04-02", "2001-04-03", "2001-04-04",
"2001-04-05", "2001-04-06", "2001-04-07", "2001-06-01",
"2001-06-02", "2001-06-03", "2001-06-04", "2001-06-05",
"2001-06-06"),
cfs = abs(rnorm(13)))
zoodf <- read.zoo(df, format = "%Y-%m-%d")
Since I want to calculate the 3-day moving minimum for each month I've defined a function using rollapply:
f.3daylow <- function(x){rollapply(x, 3, FUN=min, align = "center")}
I then use aggregate:
aggregate(zoodf, by=as.yearmon, FUN=f.3daylow)
This promptly returns an error message:
Error in zoo(df, ix[!is.na(ix)]) :
“x” : attempt to define invalid zoo object
The problem appears to be that there are unequal number of data points in each month,since using the same dataframe with an additional date for June results in a correct response. Any suggestions for how to deal with this would be appreciated!
Ok, you might be thinking of something like this then. It pastes the results for each month into one data point, so that it can be returned in the aggregate function. Otherwise you may also have a look at ?aggregate.zoo for some more precise data manipulations.
f.3daylow <- function(x){paste(rollapply(x, 3, FUN=min,
align = "center"), collapse=", ")}
data <- aggregate(zoodf, by=as.yearmon, FUN=f.3daylow)
Returns, this is then a rolling window of 3 copied into 1 data point. To analyse it, eventually you will have to break it down again, so it is not recommended.
Apr 2001
0.124581285281643, 0.124581285281643, 0.124581285281643,
0.342222172241979, 0.518874882033892
June 2001
0.454158221843514, 0.454158221843514, 0.656966528249837,
0.513613009234435
Eventually you can cut it up again via strsplit(data[1],", "), but see Convert comma separated entry to columns for more details.

Resources