Reuse a HoltWinters model using new data - r

I'm trying to reuse a HoltWinters model previously generated in R. I have found a related entry here, but it does not seem to work with HoltWinters. Basically I have tried something like this:
myModel<-HoltWinters(ts(myData),gamma=FALSE)
predict(myModel,n.ahead=10)
#time to change the data
predict(myModel,n.ahead=10,newdata=myNewData)
When I try to predict using the new data I get the same prediction.
I would appreciate any suggestion.

You can use update:
mdl <- HoltWinters(EuStockMarkets[,"FTSE"],gamma=FALSE)
predict(mdl,n.ahead=10)
Time Series:
Start = c(1998, 170)
End = c(1998, 179)
Frequency = 260
fit
[1,] 5451.093
[2,] 5447.186
[3,] 5443.279
[4,] 5439.373
[5,] 5435.466
[6,] 5431.559
[7,] 5427.652
[8,] 5423.745
[9,] 5419.838
[10,] 5415.932
predict(update(mdl,x=EuStockMarkets[,"CAC"]),n.ahead=10)]
Time Series:
Start = c(1998, 170)
End = c(1998, 179)
Frequency = 260
fit
[1,] 3995.127
[2,] 3995.253
[3,] 3995.380
[4,] 3995.506
[5,] 3995.633
[6,] 3995.759
[7,] 3995.886
[8,] 3996.013
[9,] 3996.139
[10,] 3996.266

predict.HoltWinters doesn't have a newdata argument, which is why the data doesn't get replaced. This is because the prediction doesn't require any data – it is described entirely by the coefficients argument of the model.
m <- HoltWinters(co2)
m$coefficients #These values describe the model completely;
#adding new data makes no difference

Related

Incorrect result when multiping matrixes in R?

I'm getting some weird results when multiplying these two matrices in R:
> matrix3
[1,] 3.19747172 -2.806e-05 -0.00579284 -0.00948720 -0.01054026 0.17575719
[2,] -0.00002806 2.000e-08 0.00000057 0.00000006 -0.00000009 -0.00000358
[3,] -0.00579284 5.700e-07 0.00054269 0.00001793 -0.00002686 -0.00310465
[4,] -0.00948720 6.000e-08 0.00001793 0.00003089 0.00002527 -0.00066290
[5,] -0.01054026 -9.000e-08 -0.00002686 0.00002527 0.00023776 -0.00100898
[6,] 0.17575719 -3.580e-06 -0.00310465 -0.00066290 -0.00100898 0.03725362
> matrix4
[,1]
x0 2428.711
x1 1115178.561
x2 74411.013
x3 925700.445
x4 74727.396
x5 13342.182
> matrix3%*%matrix4
[,1]
[1,] 78.4244581753
[2,] -0.0023802299
[3,] 0.1164568885
[4,] -0.0018504732
[5,] -0.0006493249
[6,] -0.1497822396
The thing is that if you try to multiply these two matrices in excel you get:
>78.4824494081686
>-0.0000419022486847151
>0.112430295996347
>-0.000379343461780479
>0.000340414687578061
>-0.14454024116344
And using online matrices I also got to excel's result.
Would love your help in understanding how to get the same result in R.
The problem occurred due to the use of the function inv() from the library(matlib).
matrix3 is a result of inversing using the inv() function.
Not sure why when I used solve() to inverse and then continued normally I got the correct matrix.
Perheps there is some kind of rounding in the inv() function.

Predict() new data into PCA space in R

After performing a principal component analysis of a first data set (a), I projected a second data set (b) into PCA space of the first data set.
From this, I want to extract the variable loadings for the projected analysis of (b). Variable loadings of the PCA of (a) are returned by prcomp(). How can I retrieve the variable loadings of (b), projected into PCA space of (a)?
# set seed and define variables
set.seed(1)
a = replicate(10, rnorm(10))
b = replicate (10, rnorm(10))
# pca of data A and project B into PCA space of A
pca.a = prcomp(a)
project.b = predict(pca.a, b)
# variable loadings
loads.a = pca.a$rotation
Here's an annotated version of your code to make it clear what is happening at each step. First, the original PCA is performed on matrix a:
pca.a = prcomp(a)
This calculates the loadings for each principal component (PC). At the next step, these loadings together with a new data set, b, are used to calculate PC scores:
project.b = predict(pca.a, b)
So, the loadings are the same, but the PC scores are different. If we look at project.b, we see that each column corresponds to a PC:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
[1,] -0.2922447 0.10253581 0.55873366 1.3168437 1.93686163 0.998935945 2.14832483 -1.43922296
[2,] 0.1855480 -0.97631967 -0.06419207 0.6375200 -1.63994127 0.110028191 -0.27612541 -0.37640710
[3,] -1.5924242 0.31368878 -0.63199409 -0.2535251 0.59116005 0.214116915 1.20873962 -0.64494388
[4,] 1.2117977 0.29213928 1.53928110 -0.7755299 0.16586295 0.030802395 0.63225374 -1.72053189
[5,] 0.5637298 0.13836395 -1.41236348 0.2931681 -0.64187233 1.035226594 0.67933996 -1.05234872
[6,] 0.2874210 1.18573157 0.04358772 -1.1941734 -0.04399808 -0.113752847 -0.33507195 -1.34592414
[7,] 0.5629731 -1.02835365 0.36218131 1.4117908 -0.96923175 -1.213684882 0.02221423 1.14483112
[8,] 1.2854406 0.09373952 -1.46038333 0.6885674 0.39455369 0.756654205 1.97699073 -1.17281174
[9,] 0.8573656 0.07810452 -0.06576772 -0.5200661 0.22985518 0.007571489 2.29289637 -0.79979214
[10,] 0.1650144 -0.50060018 -0.14882996 0.2065622 2.79581428 0.813803739 0.71632238 0.09845912
PC9 PC10
[1,] -0.19795112 0.7914249
[2,] 1.09531789 0.4595785
[3,] -1.50564724 0.2509829
[4,] 0.05073079 0.6066653
[5,] -1.62126318 0.1959087
[6,] 0.14899277 2.9140809
[7,] 1.81473300 0.0617095
[8,] 1.47422298 0.6670124
[9,] -0.53998583 0.7051178
[10,] 0.80919039 1.5207123
Hopefully, that makes sense, but I'm yet to finish my first coffee of the day, so no guarantees.

How to access sapply output including lmer

I'm trying to estimate and simulate various models. I can fit many models with sapply but somehow, I'm unable to access the output.
models <- sapply("accept.progov ~ ptot_dev+swacceptcn+(swacceptcn|coal.general)", FUN = function(X) lmer(X, data=dbq))
I got that far, that I can work with the model by further applying sapply, for example, to simulate:
sims <- sapply(X = models , FUN = function(X) sim(X, n=10))
However, now I need to extract the fixef and ranef of sims. By printing models or sims they look fairly like lmer outputs, but they are not. It seems logical that I get such an error message when trying to access fixed effects as with lmer output:
sims#fixef
Error: trying to get slot "fixef" from an object of a basic class ("list") with no slots
class(sims)
[1] "list"
Any idea on how to access the output (or convert it to be able to access it)?
Thanks
Here's the output of sims:
sims
$`accept.progov ~ ptot_dev+swacceptcn+(swacceptcn|coal.general)`
An object of class "sim.merMod"
Slot "fixef":
(Intercept) ptot_dev swacceptcn
[1,] 71.26230 -0.5967700 -5.125157
[2,] 72.31654 -0.3331660 -13.210371
[3,] 72.73718 -0.3910768 -15.319903
[4,] 68.60344 -0.5775278 -10.106682
[5,] 70.36609 -0.3897952 -7.883180
[6,] 70.11542 -0.3413212 -10.959867
[7,] 73.26847 -0.4599989 -10.302523
[8,] 73.46677 -0.4627529 -14.547429
[9,] 69.99146 -0.5947487 -8.681075
[10,] 71.97546 -0.4976680 -10.109415
Slot "ranef":
$coal.general
, , (Intercept)
1 2 3 4 5 6 7 8 9
[1,] -0.3275480720 -10.93724811 12.692639 -3.727188 -0.2119881 1.63602645 1.4972587 -0.4007792 1.354840
[2,] -2.9357382258 -8.47344764 9.832591 -15.602822 -2.0867660 -3.32143496 7.1446528 -7.2902852 10.593827
[3,] -0.5738514837 -6.58777257 7.189278 3.272100 -3.7302182 -2.77115752 4.6410860 -6.9497532 7.013610
[4,] 0.0008799287 -9.42620987 7.733388 -8.888649 -2.7795506 -1.98193393 -3.1739529 2.4603618 1.307669
[5,] 1.5177874134 -10.51052960 10.816926 -4.103975 -8.2232044 0.43857146 4.5353983 -8.1371223 -5.734714
[6,] 0.3591081598 -4.71170518 11.391860 -15.928789 -10.3654403 5.13397114 -1.9557418 3.6573842 7.846707
[7,] -0.1520099025 -9.97569519 5.973820 -6.601445 -5.8213534 -5.97398796 9.1813633 12.0905868 -2.689435
[8,] -3.2966495558 -3.88700417 12.069134 3.972661 -1.3056792 -5.41674684 -0.7940412 3.3800106 6.113203
[9,] 0.9239716129 -0.03016792 -4.695256 -5.092695 -1.4194101 5.82820816 6.7456858 9.4024483 7.683213
[10,] 1.8038318596 -6.69924367 9.612527 -7.118014 -13.3545691 0.03555004 7.5745529 1.6765752 8.020667
, , swacceptcn
1 2 3 4 5 6 7 8 9
[1,] -10.799839 7.400621 3.835463 -7.5630236 -4.112801 -1.108058 -9.648384 -1.729799 -0.5488257
[2,] -4.962062 4.103715 11.493087 6.1079040 -4.432072 6.097044 -5.972890 5.072467 -2.7055490
[3,] -3.831015 0.486487 13.724554 -16.0322440 -5.487974 6.453326 -1.208757 13.072152 -3.1340066
[4,] -3.053745 8.054387 12.682886 2.8787329 3.365597 2.195597 4.271775 5.460537 2.9898383
[5,] -8.098502 4.055499 3.944880 -3.8708456 -14.567725 3.413494 -10.604984 12.821358 7.1130135
[6,] -6.626984 3.892675 7.205407 6.3425843 9.328326 -4.693105 5.304151 11.150812 -3.4270667
[7,] -13.920626 7.548634 9.682934 -5.3058276 -1.991851 4.429253 -16.905243 -10.927869 -2.0806977
[8,] -3.863126 2.470756 9.284932 -20.1617879 -5.352519 8.871024 -1.122215 -1.211589 -0.1492944
[9,] -7.229178 -5.695966 25.527378 -1.7627386 -8.622444 -2.557726 -8.459804 -7.526883 -3.7090101
[10,] -11.098350 3.598449 7.642130 0.2573062 2.701967 5.834333 -14.552764 4.590748 -12.1888232
Slot "sigma":
[1] 11.96711 11.93222 11.93597 11.35270 11.31093 11.23100 11.89647 11.62934 11.61448 11.74406

Matching a fixed point with an interval in a data frame

I'm trying to match stock trades from one data frame with the mid-quote that was prevailing during that time. Thus, the time stamps don't match exactly but I have just a corresponding time interval of quotes for the time the trade happened.
I wrote a loop which works but since I know that loops should be avoided whenever possible, I looked out for an alternative.
First, this is my loop:
t=dim(x1)[1]
z=1
for (i in 1:t) {
flag=FALSE
while(flag==FALSE){
if(x1[z,1]>x2[i,1]){
x2[i,2]=x1[z-1,2]
flag=TRUE
}
else {
z=z+1
}
}
}
I've found the advice on Stack Overflow to merge the two arrays, so I added the upper bound of the interval as another column and matched the corresponding times with the subset-function.
Unfortunately, this method takes far more time than the loop. I assume it's due to the huge array that is created by merging. The data frames with the quotes have like 500.000 observations and the transaction data 100.000.
Is there a more elegant (and especially faster) way to solve this problem?
Furthermore, for some data I get the error message "missing value where TRUE/FALSE needed", even though the if-condition works when I do it manually.
edit:
My quote data would look like this:
Time midquote
[1,] 35551 50.85229
[2,] 35589 53.77627
[3,] 36347 54.27945
[4,] 37460 52.01283
[5,] 37739 53.65414
[6,] 38249 52.34947
[7,] 38426 50.59568
[8,] 39858 53.75646
[9,] 40219 51.38876
[10,] 40915 52.09319
and my transaction data:
Time midquote
[1,] 36429 0
[2,] 38966 0
[3,] 39334 0
[4,] 39998 0
[5,] 40831 0
So I want to know the midquotes from the time in the latter from the corresponding time of the former. The time in the example is in seconds from midnight.
For your example datasets, the following approach is faster:
x2[ , 2] <- x1[vapply(x2[, 1], function(x) which(x <= x1[, 1])[1] - 1L,
FUN.VALUE = integer(1)), 2]
# Time midquote
# [1,] 36429 54.27945
# [2,] 38966 50.59568
# [3,] 39334 50.59568
# [4,] 39998 53.75646
# [5,] 40831 51.38876
A second approach:
o <- order(c(x1[ , 1], x2[ , 1]))
tmp <- c(x1[ , 2], x2[ , 2])[o]
idx <- which(!tmp)
x2[ , 2] <- tmp[unlist(tapply(idx, c(0, cumsum(diff(idx) > 1)),
function(x) x - seq_along(x)), use.names = FALSE)]
# Time midquote
# [1,] 36429 54.27945
# [2,] 38966 50.59568
# [3,] 39334 50.59568
# [4,] 39998 53.75646
# [5,] 40831 51.38876

Scaling a numeric matrix in R with values 0 to 1

Here is an excerpt of numeric matrix that I have
[1,] 30 -33.129487 3894754.1 -39.701738 -38.356477 -34.220534
[2,] 29 -44.289487 -8217525.9 -44.801738 -47.946477 -41.020534
[3,] 28 -48.439487 -4572815.9 -49.181738 -48.086477 -46.110534
[4,] 27 -48.359487 -2454575.9 -42.031738 -43.706477 -43.900534
[5,] 26 -38.919487 -2157535.9 -47.881738 -43.576477 -46.330534
[6,] 25 -45.069487 -5122485.9 -47.831738 -47.156477 -42.860534
[7,] 24 -46.207487 -2336325.9 -53.131738 -50.576477 -50.410534
[8,] 23 -51.127487 -2637685.9 -43.121738 -47.336477 -47.040534
[9,] 22 -45.645487 3700424.1 -56.151738 -47.396477 -50.720534
[10,] 21 -56.739487 1572594.1 -49.831738 -54.386577 -52.470534
[11,] 20 -46.319487 642214.1 -39.631738 -44.406577 -41.490534
What I want to do now, is to scale the values for each column to have values from 0 to 1.
I tried to accomplish this using the scale() function on my matrix (default parameters), and I got this
[1,] -0.88123100 0.53812440 -1.05963281 -1.031191482 -0.92872324
[2,] -1.17808251 -1.13538649 -1.19575096 -1.289013031 -1.11327085
[3,] -1.28847084 -0.63180980 -1.31265244 -1.292776849 -1.25141017
[4,] -1.28634287 -0.33914007 -1.12182012 -1.175023107 -1.19143220
[5,] -1.03524267 -0.29809911 -1.27795565 -1.171528133 -1.25738083
[6,] -1.19883019 -0.70775576 -1.27662116 -1.267774342 -1.16320727
[7,] -1.22910054 -0.32280189 -1.41807728 -1.359719044 -1.36810940
[8,] -1.35997055 -0.36443973 -1.15091204 -1.272613537 -1.27664977
[9,] -1.21415156 0.51127451 -1.49868058 -1.274226602 -1.37652260
[10,] -1.50924749 0.21727976 -1.33000083 -1.462151358 -1.42401647
[11,] -1.23207969 0.08873245 -1.05776452 -1.193844887 -1.12602635
Which is already close to what I want, but values from 0:1 were even better. I read the help manual of scale(), but I really don't understand how I would do that.
Try the following, which seems simple enough:
## Data to make a minimal reproducible example
m <- matrix(rnorm(9), ncol=3)
## Rescale each column to range between 0 and 1
apply(m, MARGIN = 2, FUN = function(X) (X - min(X))/diff(range(X)))
# [,1] [,2] [,3]
# [1,] 0.0000000 0.0000000 0.5220198
# [2,] 0.6239273 1.0000000 0.0000000
# [3,] 1.0000000 0.9253893 1.0000000
And if you were still to use scale:
maxs <- apply(a, 2, max)
mins <- apply(a, 2, min)
scale(a, center = mins, scale = maxs - mins)
Install the clusterSim package and run the following command:
normX = data.Normalization(x,type="n4");
scales package has a function called rescale:
set.seed(2020)
x <- runif(5, 100, 150)
scales::rescale(x)
#1.0000000 0.5053362 0.9443995 0.6671695 0.0000000
Not the prettiest but this just got the job done, since I needed to do this in a dataframe.
column_zero_one_range_scale <- function(
input_df,
columns_to_scale #columns in input_df to scale, must be numeric
){
input_df_replace <- input_df
columncount <- length(columns_to_scale)
for(i in 1:columncount){
columnnum <- columns_to_scale[i]
if(class(input_df[,columnnum]) !='numeric' & class(input_df[,columnnum])!='integer')
{print(paste('Column name ',colnames(input_df)[columnnum],' not an integer or numeric, will skip',sep='')) }
if(class(input_df[,columnnum]) %in% c('numeric','integer'))
{
vec <- input_df[,columnnum]
rangevec <- max(vec,na.rm=T)-min(vec,na.rm=T)
vec1 <- vec - min(vec,na.rm=T)
vec2 <- vec1/rangevec
}
input_df_replace[,columnnum] <- vec2
colnames(input_df_replace)[columnnum] <- paste(colnames(input_df)[columnnum],'_scaled')
}
return(input_df_replace)
}

Resources