Related
I am having a small issue with my Rstudio code. I will try to replicate my code but unfortunately there is no easy data for me to show. This is about the package forecast. What I am looking for is somehwat simpler for what is in the manual. But unfortunately, I am not able to work round it.
so the issue is with an expanding window forecast. So I have a dependent variable Y and 3 regressors (X). I am trying to build a recursive one steap ahead forecast for each X.
Here is my code.
library(forecast)
library(zoo)
library(timeDate)
library(xts)
## Load data
data = Dataset[,2:ncol(Dataset)]
st <- as.Date("1990-1-1")
en <- as.Date("2020-12-1")
tt <- seq(st, en, by = "1 month")
data = xts(data, order.by=tt)
##########################################################################
RECFORECAST=function (Y,X,h,window){
st <- as.Date("1990-1-1")
en <- as.Date("2020-12-1")
tt <- seq(st, en, by = "1 month")
datas= cbind(Y,X)
newfcast= matrix(0,nrow(datas),h)
for (k in 1:nrow(datas)){
sample =datas[1:(window+k-1),]
# print(sample)
v= window+k
# print(v)
# fit = Arima(sample[,1], order=c(0,0,0),xreg=sample[,2])
fit = lm(sample[,1]~sample[,2], data = sample)
# fcast=forecast(fit,xreg=rep(sample[v,2],h))$mean
fcast = forecast.lm(fit,sample[v,2],h=1)$mean
print(fcast)
# print(fcast)
# newfcast[k+window+1,]=fcast
}
print(newfcast)
return(newfcast)
}
## Code to send the loop into forecasts
StoreMatrix = data$growth ## This is the first column data[,1]
for (i in 2:4)
{
try({
X=data[,i]
Y=data[,1]
RecModel=RECFORECAST(Y,X,h=1,window=60) ##Here the initial window is 60 obs
StoreMatrix=cbind(StoreMatrix,RecModel)
print(StoreMatrix)
}, silent=T)
}
The bits # were different ways I tried to crosscheck my data and they may not be useful. I have tried so many things but I don't seem to be able to get my head through it. At the end I want to have a matrix (StoreMatrix) with the first variable being the realization, and each of the columns with the corresponding 1 step ahead forecast.
The main lines where there seems to be an issue are these ones:
# fcast=forecast(fit,xreg=rep(sample[v,2],h))$mean
fcast = forecast.lm(fit,sample[v,2],h=1)$mean
Note sure how to solve this. Thank you very much.
I have a question regarding LDA in topicmodels in R.
I created a matrix with documents as rows, terms as columns, and the number of terms in a document as respective values from a data frame. While I wanted to start LDA, I got an Error Message stating "Error in !all.equal(x$v, as.integer(x$v)) : invalid argument type" . The data contains 1675 documents of 368 terms. What can I do to make the code work?
library("tm")
library("topicmodels")
data_matrix <- data %>%
group_by(documents, terms) %>%
tally %>%
spread(terms, n, fill=0)
doctermmatrix <- as.DocumentTermMatrix(data_matrix, weightTf("data_matrix"))
lda_head <- topicmodels::LDA(doctermmatrix, 10, method="Gibbs")
Help is much appreciated!
edit
# Toy Data
documentstoy <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)
meta1toy <- c(3,4,1,12,1,2,3,5,1,4,2,1,1,1,1,1)
meta2toy <- c(10,0,10,1,1,0,1,1,3,3,0,0,18,1,10,10)
termstoy <- c("cus","cus","bill","bill","tube","tube","coa","coa","un","arc","arc","yib","yib","yib","dar","dar")
toydata <- data.frame(documentstoy,meta1toy,meta2toy,termstoy)
So I looked inside the code and apparently the lda() function only accepts integers as the input so you have to convert your categorical variables as below:
library('tm')
library('topicmodels')
documentstoy <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)
meta1toy <- c(3,4,1,12,1,2,3,5,1,4,2,1,1,1,1,1)
meta2toy <- c(10,0,10,1,1,0,1,1,3,3,0,0,18,1,10,10)
toydata <- data.frame(documentstoy,meta1toy,meta2toy)
termstoy <- c("cus","cus","bill","bill","tube","tube","coa","coa","un","arc","arc","yib","yib","yib","dar","dar")
toy_unique = unique(termstoy)
for (i in 1:length(toy_unique)){
A = as.integer(termstoy == toy_unique[i])
toydata[toy_unique[i]] = A
}
lda_head <- topicmodels::LDA(toydata, 10, method="Gibbs")
I am trying to estimate the static yield curve for Brazil using termstrc package in R. I am using the function estim_nss.couponbonds and putting 0% coupon-rates and $0 cash-flows, except for the last one which is $1000 (the face-value at maturity) -- as far as I know this is the function to do this, because the estim_nss.zeroyields only calculates the dynamic curve. The problem is that I receive the following error message:
"Error in (pos_cf[i] + 1):pos_cf[i + 1] : NA/NaN argument In addition: Warning message: In max(n_of_cf) : no non-missing arguments to max; returning -Inf "
I've tried to trace the problem using trace(estim_nss.couponbons, edit=T) but I cannot find where pos_cf[i]+1 is calculated. Based on the name I figured it could come from the postpro_bondfunction and used trace(postpro_bond, edit=T), but I couldn't find the calculation again. I believe "cf" comes from cashflow, so there could be some problem in the calculation of the cashflows somehow. I used create_cashflows_matrix to test this theory, but it works well, so I am not sure the problem is in the cashflows.
The code is:
#Creating the 'couponbond' class
ISIN <- as.character(c('ltn_2017','ltn_2018', 'ltn_2019', 'ltn_2021','ltn_2023')) #Bond's identification
MATURITYDATE <- as.Date(c(42736, 43101, 43466, 44197, 44927), origin='1899-12-30') #Dates are in system's format
ISSUEDATE <- as.Date(c(41288,41666,42395, 42073, 42395), origin='1899-12-30') #Dates are in system's format
COUPONRATE <- rep(0,5) #Coupon rates are 0 because these are zero-coupon bonds
PRICE <- c(969.32, 867.77, 782.48, 628.43, 501.95) #Prices seen 'TODAY'
ACCRUED <- rep(0.1,5) #There is no accrued interest in the brazilian bond's market
#Creating the cashflows sublist
CFISIN <- as.character(c('ltn_2017','ltn_2018', 'ltn_2019', 'ltn_2021', 'ltn_2023')) #Bond's identification
CF <- c(1000,1000,1000,1000,1000)# The face-values
DATE <- as.Date(c(42736, 43101, 43466, 44197, 44927), origin='1899-12-30') #Dates are in system's format
CASHFLOWS <- list(CFISIN,CF,DATE)
names(CASHFLOWS) <- c("ISIN","CF","DATE")
TODAY <- as.Date(42646, origin='1899-12-30')
brasil <- list(ISIN,MATURITYDATE,ISSUEDATE,
COUPONRATE,PRICE,ACCRUED,CASHFLOWS,TODAY)
names(brasil) <- c("ISIN","MATURITYDATE","ISSUEDATE","COUPONRATE",
"PRICE","ACCRUED","CASHFLOWS","TODAY")
mybonds <- list(brasil)
class(mybonds) <- "couponbonds"
#Estimating the zero-yield curve
ns_res <-estim_nss.couponbonds(mybonds, 'brasil' ,method = "ns")
#Testing the hypothesis that the error comes from the cashflow matrix
cf_p <- create_cashflows_matrix(mybonds[[1]], include_price = T)
m_p <- create_maturities_matrix(mybonds[[1]], include_price = T)
b <- bond_yields(cf_p,m_p)
Note that I am aware of this question which reports the same problem. However, it is for the dynamic curve. Besides that, there is no useful answer.
Your code has two problems. (1) doesn't name the 1st list (this is the direct reason of the error. But if modifiy it, another error happens). (2) In the cashflows sublist, at least one level of ISIN needs more than 1 data.
# ...
CFISIN <- as.character(c('ltn_2017','ltn_2018', 'ltn_2019',
'ltn_2021', 'ltn_2023', 'ltn_2023')) # added a 6th element
CF <- c(1000,1000,1000,1000,1000, 1000) # added a 6th
DATE <- as.Date(c(42736,43101,43466,44197,44927, 44928), origin='1899-12-30') # added a 6th
CASHFLOWS <- list(CFISIN,CF,DATE)
names(CASHFLOWS) <- c("ISIN","CF","DATE")
TODAY <- as.Date(42646, origin='1899-12-30')
brasil <- list(ISIN,MATURITYDATE,ISSUEDATE,
COUPONRATE,PRICE,ACCRUED,CASHFLOWS,TODAY)
names(brasil) <- c("ISIN","MATURITYDATE","ISSUEDATE","COUPONRATE",
"PRICE","ACCRUED","CASHFLOWS","TODAY")
mybonds <- list(brasil = brasil) # named the list
class(mybonds) <- "couponbonds"
ns_res <-estim_nss.couponbonds(mybonds, 'brasil', method = "ns")
Note: the error came from these lines
bonddata <- bonddata[group] # prepro_bond()'s 1st line (the direct reason).
# cf <- lapply(bonddata, create_cashflows_matrix) # the additional error
create_cashflows_matrix(mybonds[[1]], include_price = F) # don't run
i can covert one column into climdexInput object format using the following code :
tmax.dates <- as.PCICt(do.call(paste, t[,c("year",
"days")]), format="%Y %j", cal="gregorian")
tmin.dates <- as.PCICt(do.call(paste, t[,c("year",
"days")]), format="%Y %j", cal="gregorian")
prec.dates <- as.PCICt(do.call(paste, t[,c("year",
"days")]), format="%Y %j", cal="gregorian")
## Load the data in.
ci <- climdexInput.raw(tmax=ntuobs[,1],
tmin=ntuobs[,1],
prec=ntuobs[,1],tmax.dates, tmin.dates, prec.dates, base.range=c(2000, 2010))
## Create a timeseries of monthly maximum 5-day consecutive
however, ntuobs[] has 100 columns and i want apply the function on all 100 columns and then store it into 100 ci[,] columns .
"i tried applying for loop
for (i in 1:100){
ci[,i] <- climdexInput.raw(tmax=ntuobs[,i],
tmin=ntuobs[,i],
prec=ntuobs[,i],tmax.dates, tmin.dates, prec.dates, base.range=c(2000, 2010))
} "
but this gives an error
"Error in ci[, i] <- climdexInput.raw(tmax = changi[, i], tmin = ntuobs[, :
object of type 'S4' is not subsettable"
kindly suggest me ways to tackle this problem. Any loop methods using APPLY function is welcomed.
Thanks
example dataset
library(PCICt)
## Create a climdexInput object from some data already loaded in and
## ready to go.
## Parse the dates into PCICt.
tmax.dates <- as.PCICt(do.call(paste, ec.1018935.tmax[,c("year",
"jday")]), format="%Y %j", cal="gregorian")
tmin.dates <- as.PCICt(do.call(paste, ec.1018935.tmin[,c("year",
"jday")]), format="%Y %j", cal="gregorian")
prec.dates <- as.PCICt(do.call(paste, ec.1018935.prec[,c("year",
"jday")]), format="%Y %j", cal="gregorian")
## Load the data in.
ci <- climdexInput.raw(ec.1018935.tmax$MAX_TEMP,
ec.1018935.tmin$MIN_TEMP, ec.1018935.prec$ONE_DAY_PRECIPITATION,
tmax.dates, tmin.dates, prec.dates, base.range=c(1971, 2000))
## Create a timeseries of annual SDII values.
sdii <- climdex.sdii(ci)
but this is for a single column but mine data is 100 column(ensemble).
seems simple enough and I've been through all similar questions and applied them all... I'm either getting nothing or everything...
Trying to took at water temperatures (WTEMP) for specific date range(SAMPLE_DATE) 2007-06-01 to 2007-09-30 from (allconmon)
here is my code so far...
bydate<-subset(allconmon, allconmon$SAMPLE_DATE > as.Date("2007-06-01") & allconmon$SAMPLE_DATE < as.Date("2007-09-30"))
Ive also tried this but get errors
bydate2<- as.xts(allconmon$WTEMP,order.by=allconmon$SAMPLE_DATE)
bydate2['2007-06-01/2007-09-30']
Error in xts(x, order.by = order.by, frequency = frequency, .CLASS = "double", :
order.by requires an appropriate time-based object
not sure what I'm doing wrong here... seems to work for other people
I will highly recommend you using zoo package in R while dealing with time series data.
The operation you mentioned is actually a window function in zoo.
Here is the example from ?window:
Examples
window(presidents, 1960, c(1969,4)) # values in the 1960's
window(presidents, deltat = 1) # All Qtr1s
window(presidents, start = c(1945,3), deltat = 1) # All Qtr3s
window(presidents, 1944, c(1979,2), extend = TRUE)
pres <- window(presidents, 1945, c(1949,4)) # values in the 1940's
window(pres, 1945.25, 1945.50) <- c(60, 70)
window(pres, 1944, 1944.75) <- 0 # will generate a warning
window(pres, c(1945,4), c(1949,4), frequency = 1) <- 85:89
pres
Here is a list of papers from JSS demonstrating the usage of the zoo package also reshape your data which I found very inspiring.
I figured it out! on multiple levels... first off I didn't notice that R did something funky with my sample date label when I uploaded from text file... probably my fault...
here is a small sample of the data set. its 5,573,301 observations of 30 variables
notice the funky symbol in front of sample date.... not sure why R did that...
ï..SAMPLE_DATE SampleTime STATION SONDE Layer TOTAL_DEPTH TOTAL_DEPTH_A BATT BATT_A WTEMP WTEMP_A SPCOND SPCOND_A SALINITY SALINITY_A DO_SAT DO_SAT_A
however what I did.... (i changed the name to x as allconmon was a bit excessive)
x <- read.csv(file = "C:/Users/Desktop/cmon2001-08.txt",quote = "",header = TRUE,sep = "\t", na.strings = c("","NULL"))
library(chron)
x$month <- months(as.Date(x$ï..SAMPLE_DATE, "%Y-%m-%d"))
x$year <- substr(as.character(x$ï..SAMPLE_DATE), 1, 4)
y <- x[x$month == 'June' | x$month == 'July' | x$month == 'August' | x$month == 'September' ,]
so now I was able to subset all my data by those 4 months and then later by year, station, and water temp....