How to generate n MarkovChain sequences of 25 transitions each - r

I have a transition matrix "T" and would like to produce 20 different sequences of 25 states each.
I have the markovchain package and have tried the following:
lapply(1:20,markovchainSequence(n = 25, markovchain = T, t0 = "In"))
but it says that markovcahinsequence is not a function. Is there a way around this please?

A reproducible example can really help here but I think this does the job done! You may just need a bigger transition matrix?!
set.seed(123)
statesNames <- c("a", "b", "c") #easier with three states
t <- new("markovchain", states = statesNames,
transitionMatrix = matrix(c(0.2, 0.5, 0.3, 0, 0.2, 0.8, 0.1, 0.8, 0.1),
nrow = 3, byrow = TRUE, dimnames = list(statesNames, statesNames)))
mchain = function(n){
markovchainSequence(n = n, markovchain = t, t0 = "a")
}
lapply(rep(25, each=20), mchain) # you may change 25 to desired number

Related

How do I graph a Bayesian Network with instantiated nodes using bnlearn and graphviz?

I am trying to graph a Bayesian Network (BN) with instantiated nodes using the libraries bnlearn and Rgraphviz. My workflow is as follow:
After creating a data frame with random data (the data I am actually using is obviously not random) I then discretise the data, structure learn the directed acyclic graph (DAG), fit the data to the DAG and then plot the DAG. I also plot a DAG which shows the posterior probabilities of each of the nodes.
#rm(list = ls())
library(bnlearn)
library(Rgraphviz)
# Generating random dataframe
data_clean <- data.frame(a = runif(min = 0, max = 100, n = 1000),
b = runif(min = 0, max = 100, n = 1000),
c = runif(min = 0, max = 100, n = 1000),
d = runif(min = 0, max = 100, n = 1000),
e = runif(min = 0, max = 100, n = 1000))
# Discretising the data into 3 bins
bins <- 3
data_discrete <- discretize(data_clean, breaks = bins)
# Creating factors for each bin in the data
lv <- c("low", "med", "high")
for (i in names(data_discrete)){
levels(data_discrete[, i]) = lv
}
# Structure learning the DAG from the training set
whitelist <- matrix(c("a", "b",
"b", "c",
"c", "e",
"a", "d",
"d", "e"),
ncol = 2, byrow = TRUE, dimnames = list(NULL, c("from", "to")))
bn.hc <- hc(data_discrete, whitelist = whitelist)
# Plotting the DAG
dag.hc <- graphviz.plot(bn.hc,
layout = "dot")
# Fitting the data to the structure
fitted <- bn.fit(bn.hc, data = data_discrete, method = "bayes")
# Plotting the DAG with posteriors
graphviz.chart(fitted, type = "barprob", layout = "dot")
The next thing I do is to manually change the distributions in the bn.fit object, assigned to fitted, and then plot a DAG that shows the instantiated nodes and the updated posterior probability of the response variable e.
# Manually instantiating
fitted_evidence <- fitted
cpt.a = matrix(c(1, 0, 0), ncol = 3, dimnames = list(NULL, lv))
cpt.c = c(1, 0, 0,
0, 1, 0,
0, 0, 1)
dim(cpt.c) <- c(3, 3)
dimnames(cpt.c) <- list("c" = lv, "b" = lv)
cpt.b = c(1, 0, 0,
0, 1, 0,
0, 0, 1)
dim(cpt.b) <- c(3, 3)
dimnames(cpt.b) <- list("b" = lv, "a" = lv)
cpt.d = c(0, 0, 1,
0, 1, 0,
1, 0, 0)
dim(cpt.d) <- c(3, 3)
dimnames(cpt.d) <- list("d" = lv, "a" = lv)
fitted_evidence$a <- cpt.a
fitted_evidence$b <- cpt.b
fitted_evidence$c <- cpt.c
fitted_evidence$d <- cpt.d
# Plotting the DAG with instantiation and posterior for response
graphviz.chart(fitted_evidence, type = "barprob", layout = "dot")
This is the result I get but my actual BN is much larger with many more arcs and it would be impractical to manually change the bn.fit object.
I would like to find out if there is a way to plot a DAG with instantiation without changing the bn.fit object manually? Is there a workaround or function that I am missing?
I think/hope I have read the documentation for bnlearn thoroughly. I appreciate any feedback and would be happy to change anything in the question if I have not conveyed my thoughts clearly enough.
Thank you.
How about using cpdist to draw samples from the posterior given the evidence. You can then estimate the updated parameters using bn.fit using the cpdist samples. Then plot as before.
An example:
set.seed(69184390) # for sampling
# Your evidence vector
ev <- list(a = "low", b="low", c="low", d="high")
# draw samples
updated_dat <- cpdist(fitted, nodes=bnlearn::nodes(fitted), evidence=ev, method="lw", n=1e6)
# refit : you'll get warnings over missing levels
updated_fit <- bn.fit(bn.hc, data = updated_dat)
# plot
par(mar=rep(0,4))
graphviz.chart(updated_fit, type = "barprob", layout = "dot")
Note I used bnlearn::nodes as nodes is masked by a dependency of Rgraphviz. I tend to load bnlearn last.

R. lapply multinomial test to list of dataframes

I have a data frame A, which I split into a list of 100 data frames, each having 3 rows (In my real data each data frame has 500 rows). Here I show A with 2 elements of the list (row1-row3; row4-row6):
A <- data.frame(n = c(0, 1, 2, 0, 1, 2),
prob = c(0.4, 0.5, 0.1, 0.4, 0.5, 0.1),
count = c(24878, 33605, 12100 , 25899, 34777, 13765))
# This is the list:
nest <- split(A, rep(1:2, each = 3))
I want to apply the multinomial test to each of these data frames and extract the p-value of each test. So far I have done this:
library(EMT)
fun <- function(x){
multinomial.test(x$count,
prob=x$prob,
useChisq = FALSE, MonteCarlo = TRUE,
ntrial = 100, # n of withdrawals accomplished
atOnce=100)
}
lapply(nest, fun)
However, I get:
"Error in multinomial.test(x$counts_set, prob = x$norm_genome, useChisq = F, :
Observations have to be stored in a vector, e.g. 'observed <- c(5,2,1)'"
Does anyone have a smarter way of doing this?
The results of split are created with names 1, 2 and so on. That's why x$count in fun cannot access it. To make it simpler, you can combine your splitted elements using the list function and then use lapply:
n <- c(0,1,2,0,1,2)
prob <- c(0.4, 0.5, 0.1, 0.4, 0.5, 0.1)
count <- c(24878, 33605, 12100 , 25899, 34777, 13765)
A <- cbind.data.frame(n, prob, count)
nest = split(A,rep(1:2,each=3))
fun <- function(x){
multinomial.test(x$count,
prob=x$prob,
useChisq = F, MonteCarlo = TRUE,
ntrial = 100, # n of withdrawals accomplished
atOnce=100)
}
# Create a list of splitted elements
new_list <- list(nest$`1`, nest$`2`)
lapply(new_list, fun)
A solution with dplyr.
A = data.frame(n = c(0,1,2,0,1,2),
prob = c(0.4, 0.5, 0.1, 0.4, 0.5, 0.1),
count = c(43, 42, 9, 74, 82, 9))
library(dplyr)
nest <- A %>%
mutate(pattern = rep(1:2,each=3)) %>%
group_by(pattern) %>%
dplyr::summarize(mn_pvals = multinomial.test(count, prob)$p.value)
nest

Sampling iteratively without a for loop in R

Even though I think the issue I have may be simple, I nevertheless can't figure it out. Here's the thing:
I have the following list and vector. The list is used to fill up the vector:
probabilities = list(c(0.2, 0.3, 0.5), c(0.1, 0.1, 0.8), c(0.3,0.4,0.3))
nextState = c()
for(counter in 1:3){
nextState[counter] = sample(1:3, size = 1, prob = probabilities[[counter]])
}
The code works fine. However, when expanding to larger lists (>10,000 elements), the loop becomes aggravatingly slow. Since the loop above is used multiple times in the larger code, the time consumed is way too much. Would there be a way to achieve the same result without looping?
Additional question:
Thanks guys, you've been a big help. One additional question: How would approach the same issue if the probabilities and the nextState were interdependent Meaning, how could I avoid the for loop? Perhaps some code to clarify:
M <- list(matrix(c(0.1, 0.2, 0.7, 0.2, 0.2, 0.6, 0.3, 0.3, 0.4), nrow = 3, ncol = 3),
matrix(c(0.3, 0.3, 0.4, 0.5, 0.5, 0, 0.1, 0.1, 0.8), nrow = 3, ncol = 3))
probabilities <- list()
nextState <- c(2, NA, NA)
for(i in 1:2){
probabilities[[i]] <- M[[i]][nextState[i], ]
nextState[i + 1] <- sample(1:3, size = 1, prob = probabilities[[i]])
}
If you've got any idea, then you truly are miracle workers!!
try sapply
nextstate <- sapply( probabilities, function(x) {sample(1:3, size = 1, prob = x)})
benchmarks
# Unit: microseconds
# expr min lq mean median uq max neval
# for 2115.170 2223.475 2436.0797 2283.2755 2371.546 10048.64 100
# sapply 24.704 29.524 164.0261 37.3565 41.123 12763.03 100
microbenchmark::microbenchmark(
`for` = {
nextState = c()
for(counter in 1:3){
nextState[counter] = sample(1:3, size = 1, prob = probabilities[[counter]])
}
},
sapply = sapply( probabilities, function(x) {sample(1:3, size = 1, prob = x)}),
times = 100)
Another possibility with purrr package:
library(purrr)
nexstate <- map_int(probabilities, function(x) {sample(1:3, size = 1, prob = x)})
Data:
probabilities = list(c(0.2, 0.3, 0.5), c(0.1, 0.1, 0.8), c(0.3,0.4,0.3))

Change the size of the arrowheads in a markov chain plot

I've plotted a markov chain in R, but I dislike the rather hugh arrowheads that the plot-function is plotting. Is there a way to make the heads smaller?
library( markovchain )
transition.matrix <- matrix( data = c( 0.5, 0, 0, 0.5, 0.2, 0, 0, 0.8, 1 ),
nrow = 3, ncol = 3,
dimnames = list( c( "A", "B", "C" ), c( "A", "B", "C" ) ) )
transition.matrix <- new( "markovchain", transitionMatrix = transition.matrix )
print( transition.matrix )
plot( transition.matrix )
markovchain uses the igraph package to plot transition matrices, so you can use parameters from that package to adjust the graph. For example, to set the arrowhead size:
plot(transition.matrix, edge.arrow.size=0.5)
For more information on customization, see the igraph manual.

Iteratively define user-defined discrete distributions

I am writing a script that, using -distr-, defines some discrete distributions based on the following objects:
margins <- c("discrete1", "discrete2")
vec1 <- list(support=c(0,1,2), probabilities=c(0.2, 0.2, 0.6))
vec2 <- list(support=c(12,14,20), probabilities=c(0.1, 0.15, 0.75))
Here you have the code that works as expeced: it creates the two distributions.
library("distr")
discrete1 <- DiscreteDistribution (supp = vec1[[1]], prob = vec1[[2]])
ddiscrete1 <- d(discrete1) # Density function
pdiscrete1 <- p(discrete1) # Distribution function
qdiscrete1 <- q(discrete1) # Quantile function
rdiscrete1 <- r(discrete1)
discrete2 <- DiscreteDistribution (supp = vec2[[1]], prob = vec2[[2]])
ddiscrete2 <- d(discrete2)
pdiscrete2 <- p(discrete2)
qdiscrete2 <- q(discrete2)
rdiscrete2 <- r(discrete2)
Once the two (or possibly more) distributions are defined, my final goal is to sample random numbers from them:
rdiscrete1(100)
rdiscrete2(100)
The problem with this code is that the number of distributions can be very high.. I wonder how it could be possible to automatize the creation of the functions in a more elegant manner.
Also, I need the two functions to be of class DiscreteDistribution and not as nested in lists (see is(discrete1) in my example).
l <- list(list(support = c(0, 1, 2), probabilities = c(0.2, 0.2, 0.6)),
list(support = c(12, 14, 20), probabilities = c(0.1, 0.15, 0.75)))
distrs <- lapply(1:length(l), function(n) {
d <- DiscreteDistribution(supp = l[[n]][[1]], prob = l[[n]][[2]])
list(d = d, dd = d(d), pd = p(d), qd = q(d), rd = r(d))
})
# First object of class DiscreteDistribution
is(distrs[[1]][[1]])
# [1] "DiscreteDistribution" "UnivariateDistribution" "AcDcLcDistribution"
# [4] "Distribution" "UnivDistrListOrDistribution"
# Random numbers
dim(sapply(distrs, function(x) x[[5]](100)))
# [1] 100 2

Resources