I'm using the GillespieSSA package for R, which allows you to call a single instance of a stochastic simulation easily and quickly. But a single realization isn't super-useful, I'd like to be able to look at the variability that arises from chance, etc. Consider a very basic model:
library(GillespieSSA)
x0 <- c(S=499, I=1, R=0)
a <- c("0.001*{S}*{I}","0.1*{I}")
nu <- matrix(c(-1,0,
+1,-1,
0, +1),nrow=3,byrow=T)
out <- ssa(x0, a, nu, tf=100)
That puts out a profoundly complicated list, the interesting bits of which are in out$data.
My question is, I can grab out$data for a single call of the instance, tag it with a variable indicating what call of the function it is, and then append that data to the old data to come out with one big set at the end. So in crude pseudo-R, something like:
nruns <- 10
for (i in 1:nruns){
out <- ssa(x0, a, nu, tf=100)
data <- out$data
run <- rep(i,times=length[data[,2]))
data <- cbind(data,run)
But where data isn't being overwritten at each time step. I feel like I'm close, but having hopped between several languages this week, my R loop fu, weak as it already is, is failing.
I am not sure if I understood you correctly. Do you want to do something like the following?
out <- lapply(X=1:10,FUN=function(x) ssa(x0, a, nu, tf=100)$data)
This would do 10 runs and put the resulting data lists in a list. You could then access,e.g., the data from the second run with out[[2]].
Related
I'm trying to perform multiple imputation on a dataset in R where I have two variables, one of which needs to be the same or greater than the other one. I have set up the method and the predictive matrix, but I am having trouble understanding how to configure the post-processing. The manual (or main paper - van Buuren and Groothuis-Oudshoorn, 2011) states (section 3.5): "The mice() function has an argument post that takes a vector of strings of R commands. These commands are parsed and evaluated just after the univariate imputation function returns, and thus provide a way to post-process the imputed values." There are a couple of examples, of which the second one seems most useful:
R> post["gen"] <- "imp[[j]][p$data$age[!r[,j]]<5,i] <- levels(boys$gen)[1]"
this suggests to me that I could do:
R> ini <- mice(cbind(boys), max = 0, print = FALSE)
R> post["A"] <- "imp[[j]][p$data$B[!r[,j]]>p$data$A[!r[,j]],i] <- levels(boys$A)[boys$B]"
However, this doesn't work (when I plot A v B, I get random scatter rather than the points being confined to one half of the graph where A >= B).
I have also tried using the ifdo() function, as suggested in another sx post:
post["A"] <- "ifdo(A < B), B"
However, it seems the ifdo() function is not yet implemented. I tried running the code suggested for inspiration but afraid my R programming skills are not that brilliant.
So, in summary, has anyone any advice about how to implement post-processing in mice such that value A >= value B in the final imputed datasets?
Ok, so I've found an answer to my own question - but maybe this isn't the best way to do it.
In FIMD, there is a suggestion to do this kind of thing outside the imputation process, which thus gives:
R> long <- mice::complete(imp, "long", include = TRUE)
R> long$A <- with(long, ifelse(B < A, B, A))
This seems to work, so I'm happy.
Months ago, I was seraching and finding a function that could do the following things:
From script A, it could execute script B n times. As a result, it would create a list with n elements and each list element would contain only the last-requested object of script B. For example, if the last line of script B contains a vector consisting of the means of some data, the list would contain only that vector.
I can't seem to find this function any more.
Here is what I need to do:
My script B contains of simulations and calculations performed on the simulated data. As a result, the script prints a matrix. I want to re-execute this script n times, resulting in a list with n elements, each containing a result matrix.
A bonus would be to be able to vary the seed in script B - that is, list[[1]] containing the data simulated with set.seed(1).
I know this problem begs for a function from the apply family - in the following example, source() was not accepted as a function.
listsmalln <- lapply(n, source("Small samples/scriptB.R"))
Hope this was understandable! Thanks in advance!
I think you should use:
lapply(1:n, function(x) source("test.R"))
The most easy way to change the seed is to create a function from the script with an input parameter with the seed.
If your script B could be a function which returns the matrix it's straightforward:
Script B: (as functionB.R):
functionB <- function(x){
set.seed(x)
return matrix(runif(100),ncol=10)
}
Script A:
source('functionB.R')
lapply(1:10,function(x) functionB(x))
I essentially need to iterate through a set of values for parameters A,B,C to generate a table of results that will help me analyze the importance of such parameters. This is for a program in R.
Let's say that:
A goes from rangeA = 1:10
B goes from rangeB = 11:20
C goes from rangeC = 21:30
The simplest (not most efficient) solution that I currently use goes something like this:
### here I create this empty dataframe because I add on each tmp calc later
res <- data.frame()
### here i just create a random dataframe for replicative purposes
dataset <- data.frame(replicate(10,sample(0:1,1000,rep=TRUE)))
ParameterAdjustment() <- function{
for(a in rangeA){
for(b in rangeB){
for(c in rangeC){
### this is a complicated calculation that is much more
### difficult than the replicable example below
tmp <- CalculateSomething(dataset,a,b,c)
### an example calculation
### EDIT NEW EXAMPLE CALCULATION
tmp <- colMeans(dataset+a*b*c)
tmp <- data.frame(data.frame(t(tmp),sd(tmp))
res <- rbind(res,tmp)
}
}
}
return(res)
}
My problem is that this works fine with my original dataset that runs calculations on a 7000x500 dataframe. However, my new datasets are much larger and performance has become a significant issue. Can anyone suggest or help with a more efficient solution? Thank you.
Not sure what language the above is, so not sure how relevant this is but here goes: Are you outputting/sending the data as you go or collecting all the display-results in memory then outputting them all in one go at the end? When I've encountered similar problems with large datasets and this approach has helped me out a few times. For example, sending 10,000s of data-points back to the client for a graph, rather than generating an array of all those points and sending that, I output to screen after each point and then free up the memory. It still takes a while but that's unavoidable. The important bit is that it doesn't crash.
Most of the times there are more than one ways to implement a solution for a specific problem. Hence, there are bad solutions and good solutions. I consider robust implementations the ones that include for loops and while statements, lists or any other function and build-in types that makes our life easier.
I am looking forward to see and understand some examples of high-programming in R.
Assume a task like the following.
#IMPORT DATASET
Dataset <- read.table("blablabla\\dataset.txt", header=T, dec=".")
#TRAINING OF MODEL
Modeltrain <- lm(temperature~latitude+sea.distance+altitude, data=Dataset)
#COEFFICIENT VALUES FOR INDEPENDENT VARIABLES
Intercept <- summary(Modeltrain)$coefficients[1]
Latitude <- summary(Modeltrain)$coefficients[2]
Sea.distance <- summary(Modeltrain)$coefficients[3]
Altitude <- summary(Modeltrain)$coefficients[4]
#ASK FOR USER INPUT AND CALCULATE y
i <- 1
while (i == 1){
#LATITUDE (Xlat)
cat("Input latitude value please: ")
Xlat <- readLines(con="stdin", 1)
Xlat <- as.numeric(Xlat)
cat(Xlat, "is the latitude value. \n")
#LONGTITUDE (Xlong)
#CALCULATE DISTANCE FROM SEA (Xdifs)
cat("Input longtitude value please: ")
Xlong <- readLines(con="stdin", 1)
Xlong <- as.numeric(Xlong)
#cat(Xlong, "\n")
Xdifs <- min(4-Xlong, Xlat)
cat(Xdifs, "is the calculated distance from sea value. \n")
#ALTITUDE(Xlat)
cat("Input altitude value please: ")
Xalt <- readLines(con="stdin", 1)
Xalt <- as.numeric(Xalt)
cat(Xalt, "is the altitude value. \n")
y = Intercept + Latitude*Xlat + Sea.distance*Xdifs + Altitude*Xalt
cat(y, "is the predicted temperature value.\n")
}
First of all, i would like to ask how to, instead of blablabla\\dataset.txt, set an absolute path making the script functional in other OS too.
Second question is how do i automate the above process, to include additional X variables as well, without having to add them manually in the script.
I understand the latest question probably means re-writing the whole thing therefore i don't expect an answer. As i said before i am more interested in understanding how it could be done and do it myself.
Thanks.
p.s. please don't ask for a reproducible example i can't provide much else.
For the first question, you may want to look at the file.path command. For the second, I would approach this by defining, outside the while loop, two lists, one to store the prompts (e.g. list(lat="Please enter Latitude")) and another, with identical names, to store the input values. Then another loop inside the while iterates through the names of the first list, produces the relevant prompt, and stores the response in the named slot in the second list.
If your users are happy interacting with R in such a way, then you're lucky. Else, as #Roland suggests, delegate the UI to some other technology.
I'm using the library poLCA. To use the main command of the library one has to create a formula as follows:
f <- cbind(V1,V2,V3)~1
After this a command is invoked:
poLCA(f,data0,...)
V1, V2, V3 are the names of variables in the dataset data0. I'm running a simulation and I need to change the formula several times. Sometimes it has 3 variables, sometimes 4, sometimes more.
If I try something like:
f <- cbind(get(names(data0)[1]),get(names(data0)[2]),get(names(data0)[3]))~1
it works fine. But then I have to know in advance how many variables I will use. I would like to define an arbitrary vector
vars0 <- c(1,5,17,21)
and then create the formula as follows
f<- cbind(get(names(data0)[var0]))
Unfortunaly I get an error. I suspect the answer may involve some form of apply but I still don't understand very well how this functions work. Thanks in advance for any help.
Using data from the examples in ?poLCA this (possibly hackish) idiom seems to work:
library(poLCA)
vec <- c(1,3,4)
M4 <- poLCA(do.call(cbind,values[,vec])~1,values,nclass = 1)
Edit
As Hadley points out in the comments, we're making this a bit more complicated than we need. In this case values is a data frame, not a matrix, so this:
M1 <- poLCA(values[,c(1,2,4)]~1,values,nclass = 1)
generates an error, but this:
M1 <- poLCA(as.matrix(values[,c(1,2,4)])~1,values,nclass = 1)
works fine. So you can just subset the columns as long as you wrap it in as.matrix.
#DWin mentioned building the formula with paste and as.formula. I thought I'd show you what that would look like using the election dataset.
library("poLCA")
data(election)
vec <- c(1,3,4)
f <- as.formula(paste("cbind(",paste(names(election)[vec],collapse=","),")~1",sep=""))