I'm trying to estimate and simulate various models. I can fit many models with sapply but somehow, I'm unable to access the output.
models <- sapply("accept.progov ~ ptot_dev+swacceptcn+(swacceptcn|coal.general)", FUN = function(X) lmer(X, data=dbq))
I got that far, that I can work with the model by further applying sapply, for example, to simulate:
sims <- sapply(X = models , FUN = function(X) sim(X, n=10))
However, now I need to extract the fixef and ranef of sims. By printing models or sims they look fairly like lmer outputs, but they are not. It seems logical that I get such an error message when trying to access fixed effects as with lmer output:
sims#fixef
Error: trying to get slot "fixef" from an object of a basic class ("list") with no slots
class(sims)
[1] "list"
Any idea on how to access the output (or convert it to be able to access it)?
Thanks
Here's the output of sims:
sims
$`accept.progov ~ ptot_dev+swacceptcn+(swacceptcn|coal.general)`
An object of class "sim.merMod"
Slot "fixef":
(Intercept) ptot_dev swacceptcn
[1,] 71.26230 -0.5967700 -5.125157
[2,] 72.31654 -0.3331660 -13.210371
[3,] 72.73718 -0.3910768 -15.319903
[4,] 68.60344 -0.5775278 -10.106682
[5,] 70.36609 -0.3897952 -7.883180
[6,] 70.11542 -0.3413212 -10.959867
[7,] 73.26847 -0.4599989 -10.302523
[8,] 73.46677 -0.4627529 -14.547429
[9,] 69.99146 -0.5947487 -8.681075
[10,] 71.97546 -0.4976680 -10.109415
Slot "ranef":
$coal.general
, , (Intercept)
1 2 3 4 5 6 7 8 9
[1,] -0.3275480720 -10.93724811 12.692639 -3.727188 -0.2119881 1.63602645 1.4972587 -0.4007792 1.354840
[2,] -2.9357382258 -8.47344764 9.832591 -15.602822 -2.0867660 -3.32143496 7.1446528 -7.2902852 10.593827
[3,] -0.5738514837 -6.58777257 7.189278 3.272100 -3.7302182 -2.77115752 4.6410860 -6.9497532 7.013610
[4,] 0.0008799287 -9.42620987 7.733388 -8.888649 -2.7795506 -1.98193393 -3.1739529 2.4603618 1.307669
[5,] 1.5177874134 -10.51052960 10.816926 -4.103975 -8.2232044 0.43857146 4.5353983 -8.1371223 -5.734714
[6,] 0.3591081598 -4.71170518 11.391860 -15.928789 -10.3654403 5.13397114 -1.9557418 3.6573842 7.846707
[7,] -0.1520099025 -9.97569519 5.973820 -6.601445 -5.8213534 -5.97398796 9.1813633 12.0905868 -2.689435
[8,] -3.2966495558 -3.88700417 12.069134 3.972661 -1.3056792 -5.41674684 -0.7940412 3.3800106 6.113203
[9,] 0.9239716129 -0.03016792 -4.695256 -5.092695 -1.4194101 5.82820816 6.7456858 9.4024483 7.683213
[10,] 1.8038318596 -6.69924367 9.612527 -7.118014 -13.3545691 0.03555004 7.5745529 1.6765752 8.020667
, , swacceptcn
1 2 3 4 5 6 7 8 9
[1,] -10.799839 7.400621 3.835463 -7.5630236 -4.112801 -1.108058 -9.648384 -1.729799 -0.5488257
[2,] -4.962062 4.103715 11.493087 6.1079040 -4.432072 6.097044 -5.972890 5.072467 -2.7055490
[3,] -3.831015 0.486487 13.724554 -16.0322440 -5.487974 6.453326 -1.208757 13.072152 -3.1340066
[4,] -3.053745 8.054387 12.682886 2.8787329 3.365597 2.195597 4.271775 5.460537 2.9898383
[5,] -8.098502 4.055499 3.944880 -3.8708456 -14.567725 3.413494 -10.604984 12.821358 7.1130135
[6,] -6.626984 3.892675 7.205407 6.3425843 9.328326 -4.693105 5.304151 11.150812 -3.4270667
[7,] -13.920626 7.548634 9.682934 -5.3058276 -1.991851 4.429253 -16.905243 -10.927869 -2.0806977
[8,] -3.863126 2.470756 9.284932 -20.1617879 -5.352519 8.871024 -1.122215 -1.211589 -0.1492944
[9,] -7.229178 -5.695966 25.527378 -1.7627386 -8.622444 -2.557726 -8.459804 -7.526883 -3.7090101
[10,] -11.098350 3.598449 7.642130 0.2573062 2.701967 5.834333 -14.552764 4.590748 -12.1888232
Slot "sigma":
[1] 11.96711 11.93222 11.93597 11.35270 11.31093 11.23100 11.89647 11.62934 11.61448 11.74406
Related
I am using a RcppHMM package to make a GHMM(Multivariate gaussian mixture HMM model) with continuous observation.
I want to learn an EM algorithm using continuous observations with different sequence lengths.
To be specific, each observation has a different sequence length from 3 to 6.
I tried to fit the model using the whole observation dataset at once (I made the dataset with ncol=6(maximum sequence length) and filled the empty part with all zero), but it didn't work
so I separated observations as groups with the same lengths [O3, O4, O5, O6]
and updated the model by each group.
Each observation group looks like this
O3
[,1] [,2] [,3]
[1,] 0.8550940 0.3231340 0.8639223
[2,] 0.4453262 0.5840305 0.4356958
[3,] 0.4344789 -1.2234760 0.4344789
[4,] -0.5003085 3.0322560 -0.5003085
[5,] -0.1459598 -0.4661041 -0.1459598
[6,] -0.1977263 -0.6352724 -0.1977263
O4
[,1] [,2] [,3] [,4]
[1,] 0.8965332 0.3338220 0.7270241 0.8824540
[2,] 0.4033438 0.4131293 0.1593136 0.4187023
[3,] -0.7329015 -1.6828296 -0.1550487 -0.1550487
[4,] -0.3213490 7.3449076 -0.2787857 -0.2787857
[5,] -0.2868067 -0.3743332 -0.1340566 -0.1340566
[6,] 2.6832742 -0.5844305 0.2320774 0.2320774
O5
[,1] [,2] [,3] [,4] [,5]
[1,] 0.83401341 0.2492370 0.47493190 0.6440035 0.84985396
[2,] 0.37988234 0.2335883 0.17043570 0.2116066 0.36260248
[3,] -0.05240445 -0.3034002 -0.05240445 -0.3034002 -0.05240445
[4,] -0.37240867 1.1500528 -0.37240867 1.1500528 -0.37240867
[5,] -0.02056839 0.9343497 -0.02056839 0.9343497 -0.02056839
[6,] -0.27586584 -0.4406833 -0.27586584 -0.4406833 -0.27586584
O6
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.9287066 0.35065802 0.4493442 0.6142040 0.7423286 0.9217381
[2,] 0.3852644 0.09612516 0.1623447 0.1320334 0.1875127 0.3928661
[3,] 0.1436024 -0.08326038 0.7800491 0.1436024 0.1926751 0.1436024
[4,] -0.4284304 -0.27916609 -0.5224586 -0.4284304 0.1267840 -0.4284304
[5,] -0.8846364 -0.81131525 -0.1781479 -0.8846364 -0.1266250 -0.8846364
[6,] -0.2141231 -0.78377461 -0.4440142 -0.2141231 -0.7888260 -0.2141231
nrow is the number of dimension of observation, and ncol is lengths of sequences.
When I updated the model with the first group that has sequence length 3, it operated.
But when I tried to re-update model with second group that has sequence length 4, the warning message came out as below,
In learnEM(newModel, O4[, 1:4, ], iter = 20, delta = 1e-05, print = TRUE) :
It is recommended to have a covariance matrix with a determinant bigger than 1/ ((2*PI)^k) .
Does anyone know how to fix this warning message?
And is there any proper way to learn a EM algorithm with observations that have different sequence lengths using this package?
I'm getting some weird results when multiplying these two matrices in R:
> matrix3
[1,] 3.19747172 -2.806e-05 -0.00579284 -0.00948720 -0.01054026 0.17575719
[2,] -0.00002806 2.000e-08 0.00000057 0.00000006 -0.00000009 -0.00000358
[3,] -0.00579284 5.700e-07 0.00054269 0.00001793 -0.00002686 -0.00310465
[4,] -0.00948720 6.000e-08 0.00001793 0.00003089 0.00002527 -0.00066290
[5,] -0.01054026 -9.000e-08 -0.00002686 0.00002527 0.00023776 -0.00100898
[6,] 0.17575719 -3.580e-06 -0.00310465 -0.00066290 -0.00100898 0.03725362
> matrix4
[,1]
x0 2428.711
x1 1115178.561
x2 74411.013
x3 925700.445
x4 74727.396
x5 13342.182
> matrix3%*%matrix4
[,1]
[1,] 78.4244581753
[2,] -0.0023802299
[3,] 0.1164568885
[4,] -0.0018504732
[5,] -0.0006493249
[6,] -0.1497822396
The thing is that if you try to multiply these two matrices in excel you get:
>78.4824494081686
>-0.0000419022486847151
>0.112430295996347
>-0.000379343461780479
>0.000340414687578061
>-0.14454024116344
And using online matrices I also got to excel's result.
Would love your help in understanding how to get the same result in R.
The problem occurred due to the use of the function inv() from the library(matlib).
matrix3 is a result of inversing using the inv() function.
Not sure why when I used solve() to inverse and then continued normally I got the correct matrix.
Perheps there is some kind of rounding in the inv() function.
I have a matrix of n variables and I want to make an new matrix that is a pairwise difference of each vector, but not of itself. Here is an example of the data.
Transportation.services Recreational.goods.and.vehicles Recreation.services Other.services
2.958003 -0.25983789 5.526694 2.8912009
2.857370 -0.03425164 5.312857 2.9698044
2.352275 0.30536569 4.596742 2.9190123
2.093233 0.65920773 4.192716 3.2567390
1.991406 0.92246531 3.963058 3.6298314
2.065791 1.06120930 3.692287 3.4422340
I tried running a for loop below, but I'm aware that R is very slow with loops.
Difference.Matrix<- function(data){
n<-2
new.cols="New Columns"
list = list()
for (i in 1:ncol(data)){
for (j in n:ncol(data)){
name <- paste("diff",i,j,data[,i],data[,j],sep=".")
new<- data[,i]-data[,j]
list[[new.cols]]<-c(name)
data<-merge(data,new)
}
n= n+1
}
results<-list(data=data)
return(results)
}
As I said before the code is running very slow and has not even finished a single run through yet. Also I apologize for the beginner level coding. Also I am aware this code leaves the original data on the matrix, but I can delete it later.
Is it possible for me to use an apply function or foreach on this data?
You can find the pairs with combn and use apply to create the result:
apply(combn(ncol(d), 2), 2, function(x) d[,x[1]] - d[,x[2]])
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 3.217841 -2.568691 0.0668021 -5.786532 -3.151039 2.6354931
## [2,] 2.891622 -2.455487 -0.1124344 -5.347109 -3.004056 2.3430526
## [3,] 2.046909 -2.244467 -0.5667373 -4.291376 -2.613647 1.6777297
## [4,] 1.434025 -2.099483 -1.1635060 -3.533508 -2.597531 0.9359770
## [5,] 1.068941 -1.971652 -1.6384254 -3.040593 -2.707366 0.3332266
## [6,] 1.004582 -1.626496 -1.3764430 -2.631078 -2.381025 0.2500530
You can add appropriate names with another apply. Here the column names are very long, which impairs the formatting, but the labels tell what differences are in each column:
x <- apply(combn(ncol(d), 2), 2, function(x) d[,x[1]] - d[,x[2]])
colnames(x) <- apply(combn(ncol(d), 2), 2, function(x) paste(names(d)[x], collapse=' - '))
> x
Transportation.services - Recreational.goods.and.vehicles Transportation.services - Recreation.services
[1,] 3.217841 -2.568691
[2,] 2.891622 -2.455487
[3,] 2.046909 -2.244467
[4,] 1.434025 -2.099483
[5,] 1.068941 -1.971652
[6,] 1.004582 -1.626496
Transportation.services - Other.services Recreational.goods.and.vehicles - Recreation.services
[1,] 0.0668021 -5.786532
[2,] -0.1124344 -5.347109
[3,] -0.5667373 -4.291376
[4,] -1.1635060 -3.533508
[5,] -1.6384254 -3.040593
[6,] -1.3764430 -2.631078
Recreational.goods.and.vehicles - Other.services Recreation.services - Other.services
[1,] -3.151039 2.6354931
[2,] -3.004056 2.3430526
[3,] -2.613647 1.6777297
[4,] -2.597531 0.9359770
[5,] -2.707366 0.3332266
[6,] -2.381025 0.2500530
Here is an excerpt of numeric matrix that I have
[1,] 30 -33.129487 3894754.1 -39.701738 -38.356477 -34.220534
[2,] 29 -44.289487 -8217525.9 -44.801738 -47.946477 -41.020534
[3,] 28 -48.439487 -4572815.9 -49.181738 -48.086477 -46.110534
[4,] 27 -48.359487 -2454575.9 -42.031738 -43.706477 -43.900534
[5,] 26 -38.919487 -2157535.9 -47.881738 -43.576477 -46.330534
[6,] 25 -45.069487 -5122485.9 -47.831738 -47.156477 -42.860534
[7,] 24 -46.207487 -2336325.9 -53.131738 -50.576477 -50.410534
[8,] 23 -51.127487 -2637685.9 -43.121738 -47.336477 -47.040534
[9,] 22 -45.645487 3700424.1 -56.151738 -47.396477 -50.720534
[10,] 21 -56.739487 1572594.1 -49.831738 -54.386577 -52.470534
[11,] 20 -46.319487 642214.1 -39.631738 -44.406577 -41.490534
What I want to do now, is to scale the values for each column to have values from 0 to 1.
I tried to accomplish this using the scale() function on my matrix (default parameters), and I got this
[1,] -0.88123100 0.53812440 -1.05963281 -1.031191482 -0.92872324
[2,] -1.17808251 -1.13538649 -1.19575096 -1.289013031 -1.11327085
[3,] -1.28847084 -0.63180980 -1.31265244 -1.292776849 -1.25141017
[4,] -1.28634287 -0.33914007 -1.12182012 -1.175023107 -1.19143220
[5,] -1.03524267 -0.29809911 -1.27795565 -1.171528133 -1.25738083
[6,] -1.19883019 -0.70775576 -1.27662116 -1.267774342 -1.16320727
[7,] -1.22910054 -0.32280189 -1.41807728 -1.359719044 -1.36810940
[8,] -1.35997055 -0.36443973 -1.15091204 -1.272613537 -1.27664977
[9,] -1.21415156 0.51127451 -1.49868058 -1.274226602 -1.37652260
[10,] -1.50924749 0.21727976 -1.33000083 -1.462151358 -1.42401647
[11,] -1.23207969 0.08873245 -1.05776452 -1.193844887 -1.12602635
Which is already close to what I want, but values from 0:1 were even better. I read the help manual of scale(), but I really don't understand how I would do that.
Try the following, which seems simple enough:
## Data to make a minimal reproducible example
m <- matrix(rnorm(9), ncol=3)
## Rescale each column to range between 0 and 1
apply(m, MARGIN = 2, FUN = function(X) (X - min(X))/diff(range(X)))
# [,1] [,2] [,3]
# [1,] 0.0000000 0.0000000 0.5220198
# [2,] 0.6239273 1.0000000 0.0000000
# [3,] 1.0000000 0.9253893 1.0000000
And if you were still to use scale:
maxs <- apply(a, 2, max)
mins <- apply(a, 2, min)
scale(a, center = mins, scale = maxs - mins)
Install the clusterSim package and run the following command:
normX = data.Normalization(x,type="n4");
scales package has a function called rescale:
set.seed(2020)
x <- runif(5, 100, 150)
scales::rescale(x)
#1.0000000 0.5053362 0.9443995 0.6671695 0.0000000
Not the prettiest but this just got the job done, since I needed to do this in a dataframe.
column_zero_one_range_scale <- function(
input_df,
columns_to_scale #columns in input_df to scale, must be numeric
){
input_df_replace <- input_df
columncount <- length(columns_to_scale)
for(i in 1:columncount){
columnnum <- columns_to_scale[i]
if(class(input_df[,columnnum]) !='numeric' & class(input_df[,columnnum])!='integer')
{print(paste('Column name ',colnames(input_df)[columnnum],' not an integer or numeric, will skip',sep='')) }
if(class(input_df[,columnnum]) %in% c('numeric','integer'))
{
vec <- input_df[,columnnum]
rangevec <- max(vec,na.rm=T)-min(vec,na.rm=T)
vec1 <- vec - min(vec,na.rm=T)
vec2 <- vec1/rangevec
}
input_df_replace[,columnnum] <- vec2
colnames(input_df_replace)[columnnum] <- paste(colnames(input_df)[columnnum],'_scaled')
}
return(input_df_replace)
}
I'm trying to reuse a HoltWinters model previously generated in R. I have found a related entry here, but it does not seem to work with HoltWinters. Basically I have tried something like this:
myModel<-HoltWinters(ts(myData),gamma=FALSE)
predict(myModel,n.ahead=10)
#time to change the data
predict(myModel,n.ahead=10,newdata=myNewData)
When I try to predict using the new data I get the same prediction.
I would appreciate any suggestion.
You can use update:
mdl <- HoltWinters(EuStockMarkets[,"FTSE"],gamma=FALSE)
predict(mdl,n.ahead=10)
Time Series:
Start = c(1998, 170)
End = c(1998, 179)
Frequency = 260
fit
[1,] 5451.093
[2,] 5447.186
[3,] 5443.279
[4,] 5439.373
[5,] 5435.466
[6,] 5431.559
[7,] 5427.652
[8,] 5423.745
[9,] 5419.838
[10,] 5415.932
predict(update(mdl,x=EuStockMarkets[,"CAC"]),n.ahead=10)]
Time Series:
Start = c(1998, 170)
End = c(1998, 179)
Frequency = 260
fit
[1,] 3995.127
[2,] 3995.253
[3,] 3995.380
[4,] 3995.506
[5,] 3995.633
[6,] 3995.759
[7,] 3995.886
[8,] 3996.013
[9,] 3996.139
[10,] 3996.266
predict.HoltWinters doesn't have a newdata argument, which is why the data doesn't get replaced. This is because the prediction doesn't require any data – it is described entirely by the coefficients argument of the model.
m <- HoltWinters(co2)
m$coefficients #These values describe the model completely;
#adding new data makes no difference