how to develop the crossbasis using a binary term - r

The indicator hw is used to show if a day is a heat wave day. I intend to study the exposure-lag relationship of heatwave using dlnm in R. When I developed the crossbasis and predicted the results, I got an error as following:
Error in crosspred(hw.basis, model) :
coef/vcov not consistent with basis matrix. See help(crosspred)
My code:
hw.knots <- equalknots(hw, fun="ns",df=4)
hw.logknots <- logknots(10,fun="ns",df=4,intercept=TRUE)
hw.basis <- crossbasis(hw,lag=10, argvar=list(fun="ns",df=4), arglag=list(knots=hw.logknots))
model <- glm(ntr ~ hw.basis+ dow + ns(time,df=7*7),
family=quasipoisson(), data)
hw.pred<-crosspred(hw.basis,model)

Related

Cannot use coxph.predict for type="expected" with newdata in Competing Risks context

I'm using a Cox Proportional Hazards (survival::coxph) model in a competing risks context- i.e. multiple event types with one endpoint for each observation. I'm having a hard time using the coxph.predict function to show an estimate of expected number of events given a supplied set of covariates and follow-up time.
Here is an example using the mgus2 dataset in the survival package:
library(survival)
#Modify data so each subject transitions only once to a state.
crdata <- mgus2
crdata$etime <- pmin(crdata$ptime, crdata$futime)
crdata$event <- ifelse(crdata$pstat==1, 1, 2*crdata$death)
crdata$event <- factor(crdata$event, 0:2, c("censor", "PCM", "death"))
cfit <- coxph(Surv(etime, event) ~ I(age/10) + sex + mspike,
id = id, crdata)
Once I fit a model, and create a "newdata" data frame, R throws an error.
I tried using a from-scratch dataframe but this results in an error suggesting that the column size or the number of rows does not mesh:
#providing both follow-up time and covariates
nd=data.frame(etime=81 ,sex= "M", age=60, mspike=1.2)
predict(cfit, newdata=nd ,type="expected")
> Data is not the same size as it was in the original fit
I get the same issue Using model.frame when extracting the same data.frame used fitting the model.
nd=model.frame(cfit)
predict(cfit,newdata=nd,type="expected")
> Data is not the same size as it was in the original fit
This results in the same error. Trying to use the original data frame to make predictions doesn't work either:
nd=crdata[1,]
predict(cfit,newdata=nd,type="expected")
> Data is not the same size as it was in the original fit
I'm wondering what I'm missing here. Thanks in advance!
I've updated my survival package from 2.7 to 3.1 and the error thrown states that "expected" predict type is not available for multistate coxph.
> predict(fit,type="expected",newdata=newdat)
Error in predict.coxphms(fit, type = "expected", newdata = newdat) :
predict method not yet available for multistate coxph

Forecasting of multivariate data through Vector Autoregression model

I am working in the functional time series using the multivariate time series data(hourly time series data). I am using FAR model more than one order for which no statistical package is available in R, so for this I convert my data into functional form and obtained the functional principle component and from those FPCA I extract their corresponding** FPCscores**. Know I use the VAR model on those FPCscores for the forecasting of each 24 hours through the VAR model, but the VAR give me the forecasted value for all 23hours when I put phat=23, but whenever I put phat=24 for example want to predict each 24 hours its give the results in the form of NA. the code is given below
library(vars)
library(fda)
fdata<- function(mat){
nb = 27 # number of basis functions for the data
fbf = create.fourier.basis(rangeval=c(0,1), nbasis=nb) # basis for data
args=seq(0,1,length=24)
fdata1=Data2fd(args,y=t(mat),fbf) # functions generated from discretized y
return(fdata1)
}
prediction.ffpe = function(fdata1){
n = ncol(fdata1$coef)
D = nrow(fdata1$coef)
#center the data
#mu = mean.fd(fdata1)
data = center.fd(fdata1)
#ffpe = fFPE(fdata1, Pmax=10)
#p.hat = ffpe[2] #order of the model
d.hat=23
p.hat=6
#fPCA
fpca = pca.fd(data,nharm=D, centerfns=TRUE)
scores = fpca$scores[,0:d.hat]
# to avoid warnings from vars predict function below
colnames(scores) <- as.character(seq(1:d.hat))
VAR.pre= predict(VAR(scores, p.hat), n.ahead=1, type="const")$fcst
}
kindly guide me that how can I solve out my problem or what error I doing. THANKS

arima model for multiple seasonalities in R

I'm learning to create a forecasting model for time series that has multiple seasonalities. Following is the subset of dataset that I'm refering to. This dataset includes hourly data points and I wish to include daily as well as weekly seasonalities in my arima model. Following is the subset of dataset:
data= c(4,4,1,2,6,21,105,257,291,172,72,10,35,42,77,72,133,192,122,59,29,25,24,5,7,3,3,0,7,15,91,230,284,147,67,53,54,55,63,73,114,154,137,57,27,31,25,11,4,4,4,2,7,18,68,218,251,131,71,43,55,62,63,80,120,144,107,42,27,11,10,16,8,10,7,1,4,3,12,17,58,59,68,76,91,95,89,115,107,107,41,40,25,18,14,15,6,12,2,4,1,6,9,14,43,67,67,94,100,129,126,122,132,118,68,26,19,12,9,5,4,2,5,1,3,16,89,233,304,174,53,55,53,52,59,92,117,214,139,73,37,28,15,11,8,1,2,5,4,22,103,258,317,163,58,29,37,46,54,62,95,197,152,58,32,30,17,9,8,1,3,1,3,16,109,245,302,156,53,34,47,46,54,65,102,155,116,51,30,24,17,10,7,4,8,0,11,0,2,225,282,141,4,87,44,60,52,74,135,157,113,57,44,26,29,17,8,7,4,4,2,10,57,125,182,100,33,27,41,39,35,50,69,92,66,30,11,10,11,9,6,5,10,4,1,7,9,17,24,21,29,28,48,38,30,21,26,25,35,10,9,4,4,4,3,5,4,4,4,3,5,10,16,28,47,63,40,49,28,22,18,27,18,10,5,8,7,3,2,2,4,1,4,19,59,167,235,130,57,45,46,42,40,49,64,96,54,27,17,18,15,7,6,2,3,1,2,21,88,187,253,130,77,47,49,48,53,77,109,147,109,45,41,35,16,13)
The code I'm trying to use is following:
tsdata = ts (data, frequency = 24)
aicvalstemp = NULL
aicvals= NULL
for (i in 1:5) {
for (j in 1:5) {
xreg1 = fourier(tsdata,i,24)
xreg2 = fourier(tsdata,j,168)
xregs = cbind(xreg1,xreg2)
armodel = auto.arima(bike_TS_west, xreg = xregs)
aicvalstemp = cbind(i,j,armodel$aic)
aicvals = rbind(aicvals,aicvalstemp)
}
}
The cbind command in the above command fails because the number of rows in xreg1 and xreg2 are different. I even tried using 1:length(data) argument in the fourier function but that also gave me an error. If someone can rectify the mistakes in the above code to produce a forecast of next 24 hours using an arima model with minimum AIC values, it would be really helpful. Also if you can include datasplitting in your code by creating training and testing data sets, it would be totally awesome. Thanks for your help.
I don't understand the desire to fit a weekly "season" to these data as there is no evidence for one in the data subset you provided. Also, you should really log-transform the data because they do not reflect a Gaussian process as is.
So, here's how you could fit models with a some form of hourly signals.
## the data are not normal, so log transform to meet assumption of Gaussian errors
ln_dat <- log(tsdata)
## number of hours to forecast
hrs_out <- 24
## max number of Fourier terms
max_F <- 5
## empty list for model fits
mod_res <- vector("list", max_F)
## fit models with increasing Fourier terms
for (i in 1:max_F) {
xreg <- fourier(ln_dat,i)
mod_res[[i]] <- auto.arima(tsdata, xreg = xreg)
}
## table of AIC results
aic_tbl <- data.frame(F=seq(max_F), AIC=sapply(mod_res, AIC))
## number of Fourier terms in best model
F_best <- which(aic_tbl$AIC==min(aic_tbl$AIC))
## forecast from best model
fore <- forecast(mod_res[[F_best]], xreg=fourierf(ln_dat,F_best,hrs_out))

R random forest - training set using target column for prediction

I am learning how to use various random forest packages and coded up the following from example code:
library(party)
library(randomForest)
set.seed(415)
#I'll try to reproduce this with a public data set; in the mean time here's the existing code
data = read.csv(data_location, sep = ',')
test = data[1:65] #basically data w/o the "answers"
m = sample(1:(nrow(factor)),nrow(factor)/2,replace=FALSE)
o = sample(1:(nrow(data)),nrow(data)/2,replace=FALSE)
train2 = data[m,]
train3 = data[o,]
#random forest implementation
fit.rf <- randomForest(train2[,66] ~., data=train2, importance=TRUE, ntree=10000)
Prediction.rf <- predict(fit.rf, test) #to see if the predictions are accurate -- but it errors out unless I give it all data[1:66]
#cforest implementation
fit.cf <- cforest(train3[,66]~., data=train3, controls=cforest_unbiased(ntree=10000, mtry=10))
Prediction.cf <- predict(fit.cf, test, OOB=TRUE) #to see if the predictions are accurate -- but it errors out unless I give it all data[1:66]
Data[,66] is the is the target factor I'm trying to predict, but it seems that by using "~ ." to solve for it is causing the formula to use the factor in the prediction model itself.
How do I solve for the dimension I want on high-ish dimensionality data, without having to spell out exactly which dimensions to use in the formula (so I don't end up with some sort of cforest(data[,66] ~ data[,1] + data[,2] + data[,3}... etc.?
EDIT:
On a high level, I believe one basically
loads full data
breaks it down to several subsets to prevent overfitting
trains via subset data
generates a fitting formula so one can predict values of target (in my case data[,66]) given data[1:65].
so my PROBLEM is now if I give it a new set of test data, let’s say test = data{1:65], it now says “Error in eval(expr, envir, enclos) :” where it is expecting data[,66]. I want to basically predict data[,66] given the rest of the data!
I think that if the response is in train3 then it will be used as a feature.
I believe this is more like what you want:
crtl <- cforest_unbiased(ntree=1000, mtry=3)
mod <- cforest(iris[,5] ~ ., data = iris[,-5], controls=crtl)

Predict likelihood of each failure type with competing risks model in R

I'm looking to run a competing risks model on historical data and predict the likelihood of each type of death in a new dataset for a specified period (let's say one period). So far I've looked into comp.risk in the timereg package and crr in cmprsk, but am having trouble figuring out how to use their predict methods to return these likelihoods.
Using the bmt dataset (from timereg package) and comp.risk as an example, I'd like to do something like:
m <- comp.risk(Surv(time, cause>0)~platelet+age+tcell, data=bmt,
bmt$cause, causeS=1, resample.iid=1)
ndata <- data.frame(platelet=c(1,0,0), age=c(0,1,0), tcell=c(0,0,1),
start.time=c(1, 1, 1), end.time=c(2, 2, 2))
out <- predict(m, newdata=ndata)
This would ideally predict the likelihood of each type of death between t=1 and t=2, but the predict function returns other types of results.
The final line won't work because the model wasn't built with the start/stop Survival object type and comp.risk doesn't seem to take left-truncated data. To illustrate, the below model statement including start times returns the error Error in comp.risk(Surv(start.time, time, cause > 0) ~ platelet + age + :
only right censored data.
m <- comp.risk(Surv(rep(0, nrow(bmt)), time, cause>0)~platelet+age+tcell, data=bmt,
bmt$cause, causeS=1, resample.iid=1)

Resources