I want to read a bunch of factor data and create a transition matrix from it that I can visualise nicely. I found a very sweet package, called 'heemod' which, together with 'diagram' does a decent job.
For my first quick-and-dirty approach, a ran a piece of Python code to get to the matrix, then used this R sniplet to draw the graph. Note that the transition probabilities come from that undisclosed and less important Python code, but you can also just assume that I calculated it on paper.
library('heemod')
library('diagram')
mat_dim <- define_transition(
state_names = c('State_A', 'State_B', 'State_C'),
.18, .73, .09,
.22, .0, .78,
.58, .08, .33);
plot(mat_dim)
However, I would like to integrate all in R and generate the transition matrix and the graph within R and from the sequence data directly.
This is what I have so far:
library(markovchain)
library('heemod')
library('diagram')
# the data --- this is normally read from a file
data = c(1,2,1,1,1,2,3,1,3,1,2,3,1,2,1,2,3,3,3,1,2,3,2,3,1,2,3,3,1,2,3,3,1)
fdata = factor(data)
rdata = factor(data,labels=c("State_A","State_B","State_C"))
# create transition matrix
dimMatrix = createSequenceMatrix(rdata, toRowProbs = TRUE)
dimMatrix
QUESTION: how can I transfer dimMatrix so that define_transition can process it?
mat_dim <- define_transition( ??? );
plot(mat_dim)
Any ideas? Are there better/easier solutions?
The input to define_transition seems to be quite awkward. Perhaps this is due to my inexperience with the heemod package but it seems the only way to input transitions is element by element.
Here is a workaround
library(heemod)
library(diagram)
first convert the transition matrix to a list. I used rounding on the digits which is optional. This corresponds to the ... variables in define_transition
lis <- as.list(round(dimMatrix, 3))
now add to the list all other named arguments you wish:
lis$state_names = colnames(dimMatrix)
and now pass these arguments to define_transition using do.call:
plot(do.call(define_transition, lis))
Update: to the question in the comments:
lis <- as.list(t(round(dimMatrix, 3)))
lis$state_names = colnames(dimMatrix)
plot(do.call(define_transition, lis))
The reasoning behind do.call
The most obvious way (which does not work here) is to do:
define_transition(dimMatrix, state_names = colnames(dimMatrix))
however this throws an error since the define_transition expects each transition to be supplied as an argument and not as a matrix or a list. In order to avoid typing:
define_transition(0.182, 0.222,..., state_names = colnames(dimMatrix))
one can put all the arguments in a list and then call do.call on that list as I have done.
Related
I'm trying to perform multiple imputation on a dataset in R where I have two variables, one of which needs to be the same or greater than the other one. I have set up the method and the predictive matrix, but I am having trouble understanding how to configure the post-processing. The manual (or main paper - van Buuren and Groothuis-Oudshoorn, 2011) states (section 3.5): "The mice() function has an argument post that takes a vector of strings of R commands. These commands are parsed and evaluated just after the univariate imputation function returns, and thus provide a way to post-process the imputed values." There are a couple of examples, of which the second one seems most useful:
R> post["gen"] <- "imp[[j]][p$data$age[!r[,j]]<5,i] <- levels(boys$gen)[1]"
this suggests to me that I could do:
R> ini <- mice(cbind(boys), max = 0, print = FALSE)
R> post["A"] <- "imp[[j]][p$data$B[!r[,j]]>p$data$A[!r[,j]],i] <- levels(boys$A)[boys$B]"
However, this doesn't work (when I plot A v B, I get random scatter rather than the points being confined to one half of the graph where A >= B).
I have also tried using the ifdo() function, as suggested in another sx post:
post["A"] <- "ifdo(A < B), B"
However, it seems the ifdo() function is not yet implemented. I tried running the code suggested for inspiration but afraid my R programming skills are not that brilliant.
So, in summary, has anyone any advice about how to implement post-processing in mice such that value A >= value B in the final imputed datasets?
Ok, so I've found an answer to my own question - but maybe this isn't the best way to do it.
In FIMD, there is a suggestion to do this kind of thing outside the imputation process, which thus gives:
R> long <- mice::complete(imp, "long", include = TRUE)
R> long$A <- with(long, ifelse(B < A, B, A))
This seems to work, so I'm happy.
heatmap(Web_Data$Timeinpage)
str(Web_Data)
heat = c(t(as.matrix(Web_Data$Timeinpage[,-1])))
heatmap(heat)
A few items to note here:
1) by including the c() operator in the c(t(as.matrix(Web_Data$Timeinpage[,-1]))) You are creating a single vector and not a matrix. You can see this by running the following: is.matirx(c(t(as.matrix(Web_Data$Timeinpage[,-1])))). heatmap (I believe) is checking for a matrix because...
2) You need to provide a matrix with at least two rows and two columns for this function to work. Currently, you are only give on vector - time. You will need to provide some other feature of interest to have it work correctly, such as Continent.
3) If you intend to plot ONLY one field, you may consider doing as suggested here and use the image() function. (I included an example below).
4) I find the heatmap function somewhat dated in look. You may want to consider other popular functions, such as ggplot's geom_tile. (see here).
Below is an example code that should produce an output:
#fake data
Web_Data <- data.frame("Timeinpage" = c(123,321,432,555,332,1221,2,43,0, NA,10, 44),
OTHER = rep(c("good", "bad",6)) )
#a matrix with TWO columns from my data frame. Notice the c() is removed and I am not transposing. Also removing the , from [,-1]
heat <- matrix(c(Web_Data$Timeinpage[-1], Web_Data$OTHER[-1]), 2,11)
#output
heatmap(heat)
#one row
heat2 <- as.matrix(sort(Web_Data$Timeinpage[-1])) #sorting as well
#output
image(heat2)
I'm trying to write a function I can apply to a string vector or list instead of writing a loop. My goal is to run a regression for different endogenous variables and save the resulting tables. Since experienced R users tell us we should learn the apply functions, I want to give it a try. Here is my attempt:
Broken Example:
library(ExtremeBounds)
Data <- data.frame(var1=rbinom(30,1,0.2),var2=rbinom(30,1,0.2),var3=rnorm(30),var4=rnorm(30),var5=rnorm(30),var6=rnorm(30))
spec1 <- list(y=c("var1"),freevars=("var3"),doubtvars=c("var4","var5"))
spec2 <- list(y=c("var2"),freevars=("var4"),doubtvars=c("var3","var5","var6"))
specs <- c("spec1","spec2")
myfunction <- function(x){
eba <- eba(data=Data, y=x$y,
free=x$freevars,
doubtful=x$doubtvars,
reg.fun=glm, k=1, vif=7, draws=50, se.fun = se.robust, weights = "lri", family = binomial(logit))
output <- eba$bounds
output <- output[,-(3:7)]
}
lapply(specs,myfunction)
Which gives me an error that makes me guess that R does not understand when x should be "spec1" or "spec2". Also, I don't quite understand what lapply would try to collect here. Could you provide me with some best practice/hints how to communicate such things to R?
error: Error in x$y : $ operator is invalid for atomic vectors
Working example:
Here is a working example for spec1 without using apply that shows what I'm trying to do. I want to loop this example through 7 specs but I'm trying to get away from loops. The output does not have to be saved as a csv, a list of all outputs or any other collection would be great!
eba <- eba(data=Data, y=spec1$y,
free=spec1$freevars,
doubtful=spec1$doubtvars,
reg.fun=glm, k=1, vif=7, draws=50, se.fun = se.robust, weights = "lri", family = binomial(logit))
output <- eba$bounds
output <- output[,-(3:7)]
write.csv(output, "./Results/eba_pmr.csv")
Following the comments of #user20650, the solution is quite simple:
In the lapply command, use lapply(mget(specs),myfunction) which gets the names of the list elements of specs instead of the lists themselves.
Alternatively, one could define specs as a list: specs <- list(spec1,spec2) but that has the downside that the lapply command will return a list where the different specifications are numbered. The first version keeps the names of the specifications (spec1 and spec2) which which makes work with the resulting list much easier.
I am trying to create (vector) objects in R. Thereby, I want to achieve that I don't specify a priori the name of the object. For example if I have a list of length 3, I want to create the objects p1 to p3 and if I have a list of length 10, the objects p1to p10 have to be created. The length should be arbitrary and not a priori determined.
Thanks for your help!
I guess the proper way of doing that is to consider a list p = list() and then you can use p[[i]] with i as big as you wish without having specified any length.
Then once your list is filled up, you can rename it: names(p) = paste0("p",c(1:length(p)))
Finally, if you want to get all the pi variables directly accessible, you add attach(p)
This is kind of a hack but you can do the following
short_list <- list(rnorm(10),rnorm(20),1:3)
long_list <- c(short_list,short_list )
paste0("p",seq_along(short_list))
mapply(assign, paste0("p",seq_along(short_list)), short_list, MoreArgs = list(envir = .GlobalEnv))
result:
> p3
[1] 1 2 3
you can do the same with long_list
I dont see a statistical model you will need this. Better start working with lists like short_list or data.frame's directly.
PS If you just want to use it for glm you probably want to learn formula's in R.
glm(y~., data=your_data) takes all columns in your data-frame that are not named y as regressor. Maybe this helps.
assign (and maybe also attach) are often a sign that you have not yet arrived at an "Rish" version of the code.
Considering that you need this for modeling: if your $p_1 \cdot p_n$ are of the same type, you can put them into a matrix (inside a column of a data.frame; for modeling they anyways need to be of same length):
df$matrix <- p.matrix
If you directly create the data.frame, you need to make sure the matrix is not expanded to data.frame columns:
df <- data.frame (matrix = I (matrix), ...)
Then glm (y ~ matrix, ...) will work.
For examples of this technique see e.g. packages pls or hyperSpec or the pls paper in the Journal of Statistical Software.
apply is easy, but this is a nutshell for me to crack:
In multi-parametric regression, optimisers are used to find a best fit of a parametric function to say x1,x2 Data. Often, and function specific, optimisers can be faster if they try to optimise transformed parameters (e.g. with R optimisers such as DEoptim, nls.lm)
From experience I know, that different transformations for different parameters from one parametric function is even better.
I wish to apply different functions in x.trans (c.f. below) to different but in their position corresponding elements in x.val:
A mock example to work with.
#initialise
x.val <- rep(100,5); EDIT: ignore this part ==> names(x.val) <- x.names
x.select <- c(1,0,0,1,1)
x.trans <- c(log10(x),exp(x),log10(x),x^2,1/x)
#select required elements, and corresponding names
x.val = subset(x.val, x.select == 1)
x.trans = subset(x.trans, x.select == 1)
# How I tried: apply function in x.trans[i] to x.val[i]
...
Any ideas? (I have tried with apply, and sapply but can't get at the functions stored in x.trans)
You must use this instead:
x.trans <- c(log10,exp,log10,function(x)x^2,function(x)1/x)
Then this:
mapply(function(f, x) f(x), x.trans, x.val)