What is the following graph called? - graph

o-o-o-o-o-o-...
x x x x x
o-o-o-o-o-o-...
where '-,x' are edges, and 'o' are nodes. [i.e., two chains of nodes with crosslinks at every step]
Does it have mathematical terminology? Bonus points for whomever directs me to any theory related to Markov chains for such graphs.
Thanks!

It appears to be regular dynamic Bayesian network with two nodes, where both nodes depend on the values of both nodes at the preceding time step. It's a coupled factorial Markov model; you can reduce it to a simple single-chain Markov model by defining the outer-product state space Q = q1 x q2, then simply tabulating all the possible transitions from $Q^{T-1}$ to $Q^T$.

Related

Replication of Erdos-Renyi graph

I'm really new to R and I have got an assignment for my classes. I have to create 1000 networks of Erdos-Renyi model. The thing is, I actually can create one model, check it parameters like degree distribution, plot it etc.. Also I can check its transitivity and so on. However, I have to compare the average clustering coefficient (local transitivity) of those 1000 networks to some network that we have been working on classes in Cytoscape. This is the code that I already know:
library(igraph)
g<-erdos.renyi.game(1000,2000, type=c("gnm"))
transitivity(g) #and other atrributes...
g2<-replicate(g,1000)
transitivity(g2[[1]])
#now I have sort of list with every graph but when I try to analyze
#I got the communicate that it is not a graph object
I have to calculate standard deviation and mean ACC from those 1000 networks, and then compare it.
I will appreciate any kind of help.
I tried a lot actually:
g<-erdos.renyi.game(1026,2222,type=c("gnm"))
g2<-1000*g
transitivity(g2[2]) # this however ends in "not a graph object"error
g2[1] #outputs the adjacency matrix but instead of 1026 vertices,
#I've got 1026000 vertices, so multiplication doesn't replicate function
#but parameters
Also, I have tried unifying the list of graphs
glist<-1000*g
acc<-union(glist, byname="auto")
transitivity(acc) #outputs the same value as first function g (only one
#erdos-renyi graph
To multiply many graphs use replication function below
g<-erdos.renyi.game(100, 20, type=c("gnm"))
g2<-replicate(1000, erdos.renyi.game(100, 20, type=c("gnm")), simplify=FALSE);
sapply(g2, transitivity)
To calculate mean of some attribute like average degree or transitivity use:
mean(sapply(g2, transitivity))

Can PCA or Principal component regression reveal information not seen in the univariate case?

I am wondering if there is a case where you see something in the principal components (PC) what you do not see by looking univariately at the variables that the PCA is based on. For instance, considering the case of group differences: that you see a separation of two groups in one of the PCs, but not in a single variable (univariate).
I will use an example in the two dimensional setting to better illustrate my question: Lets suppose we have two groups, A and B, and for each observations we have two multivariate-normal distributed covariables.
# First Setting:
group_A <- mvrnorm(n=1000, mu=c(0,0), Sigma=matrix(c(10,3,3,2),2,2))
group_B <- mvrnorm(n=1000, mu=c(10,3), Sigma=matrix(c(10,3,3,2),2,2))
dat <- rbind(cbind.data.frame(group_A, group="A"),cbind.data.frame(group_B, group="B"))
plot(dat[,1:2], xlab="x", ylab="y", col=dat[,"group"])
In this first setting you see a group separation in the variable x, in the variable y, and you will also see a separation in both principal components. Hence, using the PCA we get the same result we got in the univariate case: the groups A and B have different values in the variables x and y.
In a second example generated by myself, you do not see a separation in variable x, variable y, or in PC1 or PC2. Hence, although our common sense suggests that we can distinguish between the two groups based on x and y, we do not observe this in the univariate case and the PCA doesn't help us either:
# Second setting
group_A <- mvrnorm(n=1000, mu=c(0,0), Sigma=matrix(c(10,3,3,2),2,2))
group_B <- mvrnorm(n=1000, mu=c(0,0), Sigma=matrix(c(10,-3,-3,2),2,2))
dat <- rbind(cbind.data.frame(group_A, group="A"),cbind.data.frame(group_B, group="B"))
plot(dat[,1:2], xlab="x", ylab="y", col=dat[,"group"])
QUESTION: Is there a case in where the PCA helps us in extracting correlations or separations we would not see in the univariate case? Can you construct one or is this not possible in the two-dimensional case.
Thank you all in advance for helping me to disentanglie this.
I think your question is mainly the result of a misunderstanding of what PCA does. It does't find clusters of data like, say, kmeans or DBSCAN. It projects n-dimensional data onto an orthogonal basis. Then it selects the top k dimensions (according to variance explained), where k < n.
So in your example, PCA doesn't know that group A was generated by some distribution and group B by another. It just sees the data in 2 dimensions and finds two principle components (from which you may or may not select 1). You might as well plot all 2000 data points in the same color.
However, if you wanted to use PCA in this instance, you would indicate that a 3rd dimension distinguishes between group A and group B. You could, for example, label group A +1 and group B -1 (or something that makes sense relative to the scale of the other dimensions). Then perform PCA on 3 dimensions, reducing to 2 or 1, depending on what the eigenvalues tell you about the variation explained.

Hidden Markov Models methods for selecting optimal number of states

Package RHmm
I have a vector which I fit into a hmm model in an attemp to select an optimal number of states for a hidden markov model
x<-c(-0.0961421466,-0.0375458485,0.0681121271,0.0259201028,0.0016780785,0.0311860542,
0.0067940299,0.0126520055,0.0357599812,0.0007679569,0.0409759326,0.0560839083,-0.0272581160,-0.0439501404,0.0321578353,0.0196158110,-0.0097262133,-0.0226182376,0.0119897380,-0.0099522863,-0.0359443106,-0.0039363349,-0.0476283592,-0.0383203835,-0.0518624079,0.0187455678,0.0950535435,0.0057115192,-0.0307805051,-0.0272725295,-0.0254645538,-0.0102565781,-0.0267986024,-0.0482906267,-0.0256826510,-0.0414746754,-0.0470666997,0.0284912760,0.1021992517,0.0875572274,0.0064152031,0.0200731787,-0.0091688456,-0.0575608699,-0.0442028942,-0.0277449185,-0.0115369429,0.0084710328,0.0745290085,0.0159369842,-0.0784550401,-0.0934970644,-0.0978390888,0.0160188869,0.0275268626,-0.0552651617,0.0033928140,0.0468507896,0.0374087653,0.0521167410,-0.0177752833,-0.0592673076,0.0514406681,0.0847486437,0.0738066194,-0.0098354049,-0.0572274292,0.0478305465,0.0096885221,-0.0445535022,-0.0153455265,-0.0105375508,0.0100704249,-0.0035215994,0.0243363762,0.0504443519,0.0570023276,0.0395103033,-0.0612817210,-0.0557737453,-0.0273657697,-0.0220077940,0.0083501817,0.0275081574,0.0323161331,0.0385741087,0.0175820844-0.0410599399,-0.0071019642,0.0431060115,-0.0107360128,-0.0007280372,0.0360799385,-0.0061620858 0.0164458899 -0.0050461344 -0.0578381588 0.0097198169 0.0027277926 -0.0127642317,
-0.0037062560, -0.0045482803, 0.0367596953, 0.0021176710,-0.0319243533,-0.0194663776,0.00 91915981,0.0061495737,-0.0090424506,0.0127655251,0.0161735008,0.0193814765,-0.0208605478,-0.0598025722,0.0022554035,0.0473633792,0.0247213549,-0.0063206694,-0.0201626938,0.0207952819,0.0379032576,0.0151612333,0.0038692090,0.0111271847,0.0497851603,0.0273431360,-0.0172488883,-0.0038909126,0.0264670631,-0.0065249612,-0.0467169856,-0.0255090099,0.0082489658, 0.0352569415,0.0272149172,0.0074228928,-0.0040191315,-0.0170611558,-0.0309531801,-0.0327952044,-0.0239372287,-0.0212792531,-0.0132712774,0.0086866983,-0.0007553260,0.0107026497,0.0065106253,-0.0321813990,-0.0081734233,0.0296845524,0.0268925281,-0.0025994962,-0.0038915206, -0.0126335449,0.0040244308,0.0227324065,0.0114903822,-0.0031516422,0.0031563335,0.0137143092,0.0026222849,0.0035802606,0.0111382363,-0.0008037881, -0.0282458124, 0.0056121633, 0.0254201390,0.0033781147,-0.0166139097,-0.0124559340,0.0088520417,0.0072600174, -0.0050320069,-0.0114740312,-0.0066160556, -0.0042080799, -0.0205501042,0.0027078715, 0.0122158472,-0.0206261771,-0.0267682015,-0.0107602258,0.0088477499,0.0165057256, 0.0106637013,0.0115216769,0.0278296526,0.0026376283,-0.0231543960,-0.0141964203)
#partitions test/train
nhs <- c(2,3,4) #number of possible states
S<-runif(length (x))<= .66
train<-print(S)
# mean conditional density of log probability of seeing the partial sequence of obs
for(i in 1:length(nhs)){
pred <- vector("list", length(x))
for(fold in 1:length(x)){
fit <- HMMFit(x [which(train==TRUE)],dis="NORMAL",nStates=nhs[i],
asymptCov=FALSE)
pred[[fold]] <- forwardBackward(fit, x[which(train==FALSE)])
}
error[i] <- pred[[fold]]$LLH
}
nhs[which.max(error)] # Optimal number of hidden states (method max log-likehood)
Every time I run the model trying to obtain the best number of states to use of the hidden markov model I get a different number of states as I believe the model is trained over randmonly selected new values. This does not happen if I just fit the model.
#score proportional to probability that a sequence is generated by a given model
nhs <- c(2,3,4)
for(i in 1:length(nhs)){
fit <- HMMFit(x, dis="NORMAL", nStates= nhs[i], asymptCov=FALSE)
VitPath = viterbi(fit, x)
error[i] <- fit[[3]]
}
error<-c(error)
error[is.na(error)] <- 10000
nhs[which.min(error)] # Optimal number of hidden states (method min AIC)
However results are very different. Which one is better, on one hand I have a model where I can test on new samples. On the other hand the second provides best fit on seen samples however results are very different. In case of the model if I repeat the test given that the training/test set change (random) the resulting number of states also change. In this case what percentage sample / training should I use as to be certain that this choice will provide generalization in the number of states.
What additional methods may I employ as to be able to select an optimal number of states
Many thanks
The recurrence quantification analysis (RQA) is a method of nonlinear data analysis which quantifies the number and duration of recurrences of a dynamical system presented by its state space trajectory.
These measures can be computed in windows along the main diagonal. This allows to study their time dependence and can be used for the detection of transitions. (vertical or horizontal point = chaos-chaos transitions or diagonal structures = chaos-order or order-chaos transitions). The lengths of diagonal lines in an RP are directly related to the ratio of determinism or predictability inherent to the system.
Another possibility is to define these measures for each diagonal parallel to the main diagonal separately. This approach enables the study of time delays, unstable periodic orbits, and by applying to measures which base on diagonal structures are able to find chaos-order transitions, measures based on vertical (horizontal) structures are able to find chaos-chaos transitions, the assessment of similarities between processes.
Cross recurrence plot (CRP) will be the equivalent of cross phase analysis in wavelets. CRP is a graph which shows all those times at which a state in one dynamical system occurs simultaneously in a second dynamical system. With other words, the CRP reveals all the times when the phase space trajectory of the first system visits roughly the same area in the phase space where the phase space trajectory of the second system is. Is this last analysis which can provide a determination of the optimal number of hidden states.

Probability to visit nodes in a random walk on graph

I have a finite undirected graph in which a node is marked as "start" and another is marked as "goal".
An agent is initially placed at the start node and it navigates through the graph randomly, i.e. at each step it chooses uniformly at random a neighbor node and moves to it.
When it reaches the goal node it stops.
I am looking for an algorithm that, for each node, gives an indication about the probability that the agent visits it, while traveling from start to goal.
Thank you.
As is often the case with graphs, it's simply a matter of knowing an appropriate way to describe the problem.
One way of writing a graph is as an adjacency matrix. If your graph G = (V, E) has |V| nodes (where |V| is the number of vertices), the this matrix will be |V| x |V|. If an edge exists between a pair of vertices, you set the item in the adjacency matrix to 1, or 0 if it isn't present.
A natural extension of this is to weighted graphs. Here, rather than 0 or 1, the adjacency matrix has some notion of weight.
In the case you're describing, you have a weighted graph where the weights are the probability of transitioning from one node to another. This type of matrix has a special name, it is a stochastic matrix. Depending on how you've arranged your matrix, this matrix will have either rows or columns that sum to 1, right and left stochastic matrices respectively.
One link between stochastic matrices and graphs is Markov Chains. In Markov chain literature the critical thing you need to have is a transition matrix (the adjacency matrix with weights equal to the probability of transition after one time-step). Let's call the transition matrix P.
Working out the probability of transitioning from one state to another after k timesteps is given by P^k. If you have a known source state i, then the i-th row of P^k will give you the probability of transitioning to any other state. This gives you an estimate of the probability of being in a given state in the short term
Depending on your source graph, it may be that P^k reaches a steady state distribution - that is, P^k = P^(k+1) for some value of k. This gives you an estimate of the probability of being in a given state in the long term
As an aside, before you do any of this, you should be able to look at your graph, and say some things about what the probability of being in a given state is at some time.
If your graph has disjoint components, the probability of being in a component that you didn't start in is zero.
If your graph has some states that are absorbing, that is, some states (or groups of states) are inescapable once you've entered them, then you'll need to account for that. This may happen if your graph is tree-like.

What is the formula to find the different unlabeled trees that can be formed from a given set of nodes?

I am just doing a research on a project and came across a problem. I would be very grateful if anybody could help me out with this. Consider the figure below:
Two dots joined by a line results in only one diagram, three dots joined by single lines also results in one figure no matter how you join the dots, the result is the same. But as we increase the dots there are different possibilities, as seen with four dots.
Is there a formula for counting the number of unlabeled trees that can be formed from a set of nodes?
As suggested in the comments, your question can be phrased as determining the number of unlabeled trees on n vertices. Notice this differs significantly from the question of counting labeled trees (of which there are n^{n-2}) or labeled graphs (of which there are 2^\binom{n}{2}).
The Online Encyclopedia of Integer Sequences has a lot of good data about this problem (including code to generate the sequence): https://oeis.org/A000055. In particular, it gives a generating function A(x) for these numbers, which is the best solution known to date (from a mathematician's perspective):
A(x) = 1 + T(x) - T^2(x)/2 + T(x^2)/2, where T(x) = x + x^2 + 2x^3 + ...
If you are not familiar with generating functions, think of it as a carefully designed polynomial whose coefficients form the desired sequence. That is, the coefficient of x^n in this polynomial would be the number of unlabeled trees on n vertices.
As a final plug, you may find this reference useful: http://austinmohr.com/work/trees. It gives some counts and images for trees of up to ten vertices.
This is non-isomorphic graph count problem.
For general case, there are 2^(n2) non-isomorphic graphs on n vertices where (n2) is binomial coefficient "n above 2".
However that may give you also some extra graphs depending on which graphs are considered the same (you also were not 100% clear which graphs do apply).
See this paper.
And this article on MathWorld.
EDIT: In case you want to count labeled trees only the formula is n^(n-2).
Wikipedia.

Resources