r data.frame code fails to run... sometimes - r

I have a data analysis module that I've been using for some time. From the output of a selected model, I can use a data.frame to predict outcomes over a range of values of interest. The following line should create a data.frame. Sometimes it will run, but sometimes the column 'tod' fails to create, and trips an error.
todData <- data.frame(kpsp=rep(c(0,1,0), each=10), tlwma=rep(c(0,0,1), each=10), tod=rep(seq(-0.25,4.25, by=.5),3), tod2=tod^2, doy=36)
This results in the following return:
Error in data.frame(kpsp = rep(c(0, 1, 0), each = 10), tlwma = rep(c(0, :
object 'tod' not found
I did some searching but couldn't get any returns... wasn't even sure how to properly search for such an issue. Thanks for any suggestions on how to make this run consistently.
A.Birdman

The error happens because we are trying to create new columns based on a column that was created within the data.frame call. A variable within the data.frame can be accessed after the data.frame object is created. We can use the data.frame call to create the initial columns and then with mutate (from dplyr) or within or transform (from base R) create new columns that depend on the initial columns.
todData <- data.frame(kpsp=rep(c(0,1,0), each=10),
tlwma=rep(c(0,0,1), each=10), tod=rep(seq(-0.25,4.25, by=.5),3),
doy = 36)
todData <- within(todData, {tod2 <- tod^2})
Or
todData <- transform(todData, tod2 = tod^2)

I think it works only when you executed tod=rep(seq(-0.25,4.25, by=.5),3) as indivudial line somewhere before.
This will work:
tod=rep(seq(-0.25,4.25, by=.5),3)
todData <- data.frame(kpsp=rep(c(0,1,0), each=10), tlwma=rep(c(0,0,1), each=10), tod=tod, tod2=tod^2, doy=36)
Or if you really want to execute this several times with one line, use this function that has a default formula for tod2 (you then won't mention tod2 in your call unless needed):
create.toData <- function(kpsp,tlwma,tod,tod2=tod^2,doy){
data.frame(kpsp=kpsp, tlwma=tlwma, tod=tod, tod2=tod2,doy=doy)
}
todData <- create.toData(kpsp=rep(c(0,1,0), each=10), tlwma=rep(c(0,0,1), each=10), tod=rep(seq(-0.25,4.25, by=.5),3), doy=36)

Related

I am trying to Calculate for each year the standard deviation and average return in R

I started learning R three days ago so pls bear with me.... if you see any flaws in my code or calculations please call them out.
I have tried this, but get a error message every time:
table.AnnualizedReturns(Apple.Monthly.Returns[, 2:3, drop = FALSE], scale = 12,
Rf = 0, geometric = TRUE, digits = 4)
Error in checkData(R) :
The data cannot be converted into a time series. If you are trying to pass in names from a data object with one column, you should use the form 'data[rows, columns, drop = FALSE]'. Rownames should have standard date formats, such as '1985-03-15'.
As you can clearly see I have no clue what I am doing.
This is every line of code I have written this far:
Dates <- Data_Task2$`Names Date`[1801:2270]
as.numeric(Dates)
Dates <- ymd(Dates)
Monthly.Return <- Data_Task2$Returns[1801:2270]
Monthly.Return <- as.numeric(Monthly.Return)
Apple.Monthly.Returns <- data.frame(Dates, Monthly.Return)
Log.return = log(Monthly.Return + 1)
Apple.Monthly.Returns$Log.return = log(Apple.Monthly.Returns$Monthly.Return + 1)
You should check out the Tidyverse and specifically dplyr (https://dplyr.tidyverse.org/).
This gets you to a good starting point:
https://www.r-bloggers.com/2014/03/using-r-quickly-calculating-summary-statistics-with-dplyr/

R forecastML package keeps renaming outcome columns

I am trying to use the forecast ML r package to run some tests but the moment I hit this step, it renames the columns
data <- read.csv("C:\\Users\\User\\Desktop\\DG ST Forecast\\LassoTemporalForecast.csv", header=TRUE)
date_frequency <- "1 week"
dates <- seq(as.Date("2012-10-05"), as.Date("2020-10-05"), by = date_frequency)
data_train <- data[1:357,]
data_test <- data[358:429,]
outcome_col <- 1 # The column index of our DriversKilled outcome.
horizons <- c(1,2,3,4,5,6,7,8,9,10,11,12) # 4 models that forecast 1, 1:3, 1:6, and 1:12 time steps ahead.
# A lookback across select time steps in the past. Feature lags 1 through 9, for instance, will be
# silently dropped from the 12-step-ahead model.
lookback <- c(1)
# A non-lagged feature that changes through time whose value we either know (e.g., month) or whose
# value we would like to forecast.
dynamic_features <- colnames(data_train)
data_list <- forecastML::create_lagged_df(data_train,
type = "train",
outcome_col = 1,
horizons = horizons,
lookback = lookback,
date = dates[1:nrow(data_train)],
frequency = date_frequency,
dynamic_features = colnames(data_train)
)
After the data_list, here is a snapshot of what happens in the console:
Next, when I try to create windows following the name change,
windows <- forecastML::create_windows(lagged_df = data_list, window_length = 36,
window_start = NULL, window_stop = NULL,
include_partial_window = TRUE)
plot(windows, data_list, show_labels = TRUE)
this error: Can't subset columns that don't exist. x Column cases doesn't exist.
I've checked through many times based on my input data and the code previously and still can't understand why the name change occurs, if anyone is familiar with this package please assist thank you!
I'm the package author. It's difficult to tell without a reproducible example, but here's what I think is going on: Dynamic features are essentially features with a lag of 0. Dynamic features also retain their original names, as opposed to lagged features which have "_lag_n" appended to the feature name. So by setting dynamic_features to all column names you are getting duplicate columns specifically for the outcome column. My guess is that "cases" is the outcome here. Fix this by removing dynamic_features = colnames(data_train) and setting it to only those features that you really want to have a lag of 0.

Overwriting previous iterations of if statement in R

I am VERY new to R and am having a very difficult time getting an answer to this, so I finally caved to post - so apologies ahead of time.
I am using a genetic algorithm to optimize the shape of an object, and want to gather the intermediate steps for prototyping. The package I am using genalg, allows a monitor function to track the data which I can print just fine. But I'd like to stash it in a data frame for other uses and keep watching it overwrite the other iterations. Here's my code for the monitor function:
monitor <- function(obj){
#Make empty data frame in which to store data
resultlist <- data.frame(matrix(nrow = 200, ncol = 10, byrow = TRUE))
#If statement evaluating each iteration of algorithm
if (obj$iter > 0){
#Put results into list corresponding to number of iteration
resultlist[,obj$iter] <- obj$population[which.min(obj$best),]}
#Make data frame available at global level for prototyping, output, etc.
resultlistOutput <<- resultlist}
I know this works in a for loop with no issues based on searches, so I must be doing something wrong or the if syntax is not capable of this?
Sincere thanks in advance for your time.
Being not sure what error you are getting, I am guessing you are getting only the result from last iteration. This is happening because you are overwriting your global dataframe in each call to monitor function. You should first initialize resultlistOutput <<- data.frame() this way and then do this:
monitor <- function(obj){
#Make empty data frame in which to store data
resultlist <- data.frame(matrix(nrow = 200, ncol = 10, byrow = TRUE))
#If statement evaluating each iteration of algorithm
if (obj$iter > 0){
#Put results into list corresponding to number of iteration
resultlist[,obj$iter] <- obj$population[which.min(obj$best),]}
#Make data frame available at global level for prototyping, output, etc.
# append the dataframe to the old result
resultlistOutput <<- rbind(resultlistOutput , resultlist)
}

R ncdf package - put.var.ncdf requiring incorrect number of dimensions

I am organizing weather data into netCDF files in R. Everything goes fine until I try to populate the netcdf variables with data, because it is asking me to specify only one dimension for two-dimensional variables.
library(ncdf)
These are the dimension tags for the variables. Each variable uses the Threshold dimension and one of the other two dimensions.
th <- dim.def.ncdf("Threshold", "level", c(5,6,7,8,9,10,50,75,100))
rt <- dim.def.ncdf("RainMinimum", "cm", c(5, 10, 25))
wt <- dim.def.ncdf("WindMinimum", "m/s", c(18, 30, 50))
The variables are created in a loop, and there are a lot of them, so for the sake of easy understanding, in my example I'll only populate the list of variables with one variable.
vars <- list()
v1 <- var.def.ncdf("ARMM_rain", "percent", list(th, rt), -1, prec="double")
vars[[length(vars)+1]] <- v1
ncdata <- create.ncdf("composite.nc", vars)
I use another loop to extract data from different data files into a 9x3 data frame named subframe while iterating through the variables of the netcdf file with varindex. For the sake of reproducing, I'll give a quick initialization for these values.
varindex <- 1
subframe <- data.frame(matrix(nrow=9, ncol=3, rep(.01, 27)))
The desired outcome from there is to populate each ncdf variable with the contents of subframe. The code to do so is:
for(x in 1:9) {
for(y in 1:3) {
value <- ifelse(is.na(subframe[x,y]), -1, subframe[x,y])
put.var.ncdf(ncdata, varindex, value, start=c(x,y), count=1)
}
}
The error message is:
Error in put.var.ncdf(ncdata, varindex, value, start = c(x, y), count = 1) :
'start' should specify 1 dims but actually specifies 2
tl;dr: I have defined two-dimensional variables using ncdf in R, I am trying to write data to them, but I am getting an error message because R believes they are single-dimensional variables instead.
Anyone know how to fix this error?

R - Calculating 12 month moving average on panel data

First, full disclosure. I attempted to do this strictly in MS Access with correlated subqueries, and had some help on this post 12 month moving average by person, date. I originally thought my data would be small enough to chug through, but it is awful. As an alternative, I'm going to try running this in R and then writing results to a new table in MS Access. I have data such that I have the following fields:
rep, cyc_date, amt
Following the linked example by Andrie for a rolling 5-year period (as opposed to the 5-year average) R: Calculating 5 year averages in panel data, I am trying to get rolling 12 month average for amt field by rep. Here is my code:
library(zoo)
library(plyr)
library(RODBC)
# Pull data from local MS Access database. The referenced sqlFetch is a query
# that pulls the data, ordered by `rep`, then `cyc_date`
channel <- odbcConnectAccess2007("C://MyDB.accdb")
data <- data.frame(sqlFetch(channel, "MyView"))
# Ensure coercion of `cyc_date` to date type
data$cyc_date <- as.Date(data$cyc_date)
# Function (take from post above)
rollmean12 <- function(x) {
rollmean(x, 12)
}
# Calculate rolling average by person
rollvec <- ddply(data, .(data$rep), rollmean12(data$amt))
Unfortunately, this doesn't work. I'm getting the following error:
Error in llply(.data = .data, .fun = .fun, ..., .progress = .progress, :
.fun is not a function.
I'm not sure why this is happening. Do I need to explicitly convert data to a zoo object? If so, not sure how to handle the extra dimensionality resulting from the person_id field. Any help would be very much appreciated.
I found this code on the following post: applying rolling mean by group in R
data$movavg <- ave(data$amt, data$rep, FUN = function(x) rollmean(x, k=12, align="right", na.pad=T)).
ave saves the day!
Just some hints, as I don't work at all with time series: ddply requires a data frame input, so don't convert it to a zoo object. .(data$rep) I think should be just .(rep), and rollmean12 should not be called with arguments. Rather, you should re-write the function to extract the columns you want. So, approximately something like this:
rollmean12 <- function(x) rollmean(x$amt, 12)
If you do ?ddply there is a link to a very helpful publication in JSS.
Try the tidyquant library
x %>% tq_mutate(
# tq_mutate args
select = amt,
mutate_fun = rollapply,
col_rename = "rollmean12", ####
# rollapply args
width = 12,
align = "right",
FUN = mean,
# mean args
na.rm = TRUE
)

Resources