How can we calculate Emission probabilities for a Hidden Markov Model (HMM) in R?
As for calculating Transition Probabilities we use function
tr <- seqtrate(exampledata)
and this function returns a Transition Matrix. Example data is a sequential data.
Is there a function that returns us an Emission Matrix?
Please have a look to R's HMM package from https://cran.r-project.org/web/packages/HMM/HMM.pdf
You can find such an example there
hmm = initHMM(c("A","B"), c("L","R"), transProbs=matrix(c(.8,.2,.2,.8),2),
emissionProbs=matrix(c(.6,.4,.4,.6),2))
print(hmm)
# Sequence of observations
observation = c("L","L","R","R")
baumWelch(hmm, observation, maxIterations=100, delta=1E-9, pseudoCount=0)
baumWelch algorithm returns the updated emission probabilities.
Related
I created the following non-homogeneous Hidden Markov Model using depmix:
#primary time series
datats<-ts(data$primary)
y<- as.numeric(datats)
#Preparing covariates
cov1ts<-ts(data$cov1)
x1<- as.numeric(cov1ts)
cov2ts<-ts(data$cov2)
x2<- as.numeric(cov2ts)
#Build model
hmm_model <- depmix(y~1, data = data.frame(y), nstates = 2, transition = ~ (scale(x1)+scale(x2)))
hmm_model <-fit(hmm_model)
summary(hmm_model)
I now want to make a prediction about the next state. In the past I did this using homogeneous HMM as explained in this post:
How to predict out-of-sample observations with depmixS4 package in R?
Specifically, in my case I did:
#[...] Created homogeneous model "hom_model" like before but without transition parameter
#transition probabilities at states
mat1<-hom_model#transition[[1]]#parameters$coefficients
mat1<-hom_model#transition[[2]]#parameters$coefficients
#transition matrix
transmat<-rbind(mat1, mat2)
# prediction as described in post, not very relevant for this question
But now for non-homogeneous hmm, I cannot obtain the transition matrix in the same way because now when I obtain mat1 and mat2, I get the coefficients of the covariates and intercept for each state. Specifically, my output for mat1 in the non-hom case looks like this:
St1 St2
(Intercept) 0 -0.6704946
scale(x1) 0 -1.7279190
scale(x2) 0 -2.0905961
I am unsure on how to obtain the transition matrix for the non-homogeneous case, and also a bit confused as why the State 1 coefficients are all 0.
Thank you
Good afternoon,
I have a series of annual maxima data (say "AMdata") I'd like to model through a non-stationary GEV distribution. In particular, I want the location to vary linearly in time, i.e.:
mu = mu0 + mu1*t.
To this end, I am using the ismev package in R, computing the parameters as follows:
require(ismev)
ydat = cbind(1:length(AMdata)) ### Co-variates - years from 1 to number of annual maxima in the data
GEV_fit_1_loc = gev.fit(xdat=AMdata,ydat=ydat,mul=1)
In such a way, I obtain 4 parameters, namely mu0,mu1,shape and scale.
My question is: can I apply the gev.fit function fixing as a condition the value of mu1? not as a starting value for the successive iterations, but as a given parameter (thus estimating the three parameters left)?
Any tip would be really appreciated!
Francesco
I am computing a Principal Component Analysis with this matrix as input using the function psych::principal . Each column in the input data is the monthly correlations between crop yields and a climatic variable in a region (30) so what I want to obtain with the PCA is to reduce the information and find simmilarities pattern of response between regions.
pc <- principal(dat,nfactors = 9, residuals = FALSE, rotate="varimax", n.obs=NA, covar=TRUE,scores=TRUE, missing=FALSE, impute="median", oblique.scores=TRUE, method="regression")
The matrix has dimensions 10*30, and the first message I get is:
The determinant of the smoothed correlation was zero. This means the
objective function is not defined. Chi square is based upon observed
residuals. The determinant of the smoothed correlation was zero. This
means the objective function is not defined for the null model either.
The Chi square is thus based upon observed correlations. Warning
messages: 1: In cor.smooth(r) : Matrix was not positive definite,
smoothing was done 2: In principal(dat, nfactors = 3, residuals = F,
rotate = "none", : The matrix is not positive semi-definite, scores
found from Structure loadings
Nontheless, the function seems to work, the main problem is when you check pc$weights and realize that is equal to pc$loadings.
When the number of columns is less than/equal to the number of rows the results are coherent, however that is not the case here.
I have to obtain the weights for refering the score values in the same magnitude as the input data (correlation values).
I would really appreciate any help.
Thank you.
I am doing a Clusteranalysis on the most significant componants.
In order to find the number of clusters I apply the Calinski Harabasz index. I have two questions:
Do I need to normalize the components before clustering. I haven't done it so far as the variance expresses the importance of a component.
Concerning the CH-index, do I calculate it on the original data or do I calculate it on the output of my pca function? I try to clarify:
pca <- prcomp(data_scaled)
pca$x
Here I use pca$x for the cluster analysis.Should I use the data_scaled dataset or the pca$x dataset for calculating the CH-index?
By "compound" I mean the transition matrix satisfies the Markov property,namely I have two columns s_t and s_t+k that represent state of each individual in two period t and t+k respectively.
What I want is to find the matrix M that
s_t+k = M^k * s_t
so that matrix M satisfies the Markov property.
My default working language is Stata, in which commands like tab, svy:tab or xttran can generate one period transition matrices, but these matrices do not necessarily satisfy the Markov property. So I wonder how to achieve my goal in Stata or other common language like R or Python.
PS:This problem raise from a paper which research many countries' GDP_per_capita transition dynamics from 1960 to 2010. Say, at the beginning of each decades, we group all countries into 5 groups (from 1:extremely poor country to 5: high-income country), so we have a distribution of countries with 5 states. It's easy if I simply estimate the decade-to-decade transition matrix using markovchain class. However, the author claim that (page11, footnote4)
“The decade average transition matrix is estimated based on
the 5-decade transition matrices from 1960 to 2010 by employing
a numerical optimization program. Instead of taking the simple average
for the five transition matrices (which suffers from Jensen’s
Inequality), we estimate a transition matrix that can give us an exact
5 decade duration transition matrix (entry in 1960 and exit in 2010)
by taking its power 5.”
In R you can use the markovchain package to get the transition matrix that satisfies markov property. You can use the following example code...
library(markovchain)
data(rain)
mysequence<-rain$rain
createSequenceMatrix(mysequence)
myFit<-markovchainFit(data=mysequence,,method="bootstrap",nboot=5, name="Bootstrap Mc")
myFit
The myFit is your estimated transition matrix. This example uses the Alofi rainfall dataset.
The multiplication of matrix in R is not * but %*%.
I wrote a simple function in R to solve the problem.
trans_mat = function(k,s_t,M){
for(i in 1:k){
M = M % * % M
}
return(M%*%s_t)}
now, what you need to do is to type in k(how long the period you want),s_t(the original state), and M(markov property).
s_t+k = trans_mat(k,s_t,M)
The markovchain package directly implements the power for any markovchain object:
require(markovchain)
#creating the MC
myMatr<-matrix(data=c(0.2,0.8,.6,.4),ncol=2,byrow=TRUE)
myMc<-as(myMatr,"markovchain")
#5th power of the MC
myMc5<-myMc^5
myMc5