porting Matlab cells to R to collect output from FOR loop - r

I have an ODE model in Matlab for which I'm interested in performing some parameter sweeps.
I am trying to port the following code from Matlab to R
for i = 1:numel(sweep1)
initial_conditions(6)=sweep1(i);
for j = 1:numel(sweep2)
parameters(3)=sweep2(j);
[t,y] = ode23s(#(timespan, initial_conditions) MODEL(timespan, initial_conditions, parameters), timespan, initial_conditions);
results_cell{i,j}=[y(end,1),y(end,2)];
The 2 FOR statements above vary first 1 initial condition (i), then for each i vary a parameter (j) and run the solver. The output from the solver for each iteration of the loop is then collected in a cell 'results_cell'
This runs fine in Matlab but I need to port it to R. The loops are the same and the solver code is implemented using deSolve, however I am not sure how to collect the results from the solver at each iteration of the loop as R doesn't have cells like Matlab, and how to gather {i,j} from each loop along with the 2 ode outputs.
Ultimately I would like to plot a heat map of the ode solver output vs the values in each of the 2 parameter sweeps.
Thanks for any help.

Here what I would do: I run the ode23 once to get the structure of the solution.
sweep1 =2
sweep2 =3
library(pracma)
f <- function(t, x,i=1,j=0)
as.matrix(c(x[1] * ((i+j) - x[2]^2) -x[2], x[1]))
t0 <- 0
tf <- 20
x0 <- as.matrix(c(0, 0.25))
sol = ode23(f, t0, tf, x0,1,1)$y
res = tail(sol,1)
Then I use replicate to create the structure of the final output matrix. Using this trick avoid us to deal with pre-allocating arrays. replicate will do for us.
results_cell = replicate(sweep1,replicate(sweep2,res))
I just run my final simulation and assign each solution to results_cell
for (i in seq(sweep1))
for (j in seq(sweep2))
results_cell[,,j,i] = tail(ode23(f, t0, tf, x0,i,j)$y,1)

I'm assuming sweep1 and sweep2 are both vectors of numbers. What you can do is use expand.grid to make a data frame of the combinations of that, and then loop over the frame once with apply:
# sweep 1, sweep 2
sweep1 <- c(1, 2, 4)
sweep2 <- c(3, 5, 7)
# expand out the combinations
combinations <- expand.grid(sweep1=sweep1, sweep2=sweep2)
# apply over the data frame
results <- apply(combinations, 1, function(row) {
# set up the parameters from the row which has been passed in.
initial_conditions[6] <- row["sweep1"]
parameters[3] <- row["sweep2"]
# call ode23s
res <- ode23s(initial_conditons, parameters, function, whatever, ...)
# there should be a nicer way than calling nrow twice here, but R doesn't
# seem to have the nice 'end' keyword
# also, we copy in the row, so that's in the output.
c(row, one=res[nrow(res), 1], two=res[nrow(res), 2])
})
# because the apply has flipped rows to columns...
results <- as.data.frame(t(results))
results
# sweep1 sweep2 one two
# 1 1 3 ... ...
# 2 2 3 ... ...
# ...
The result of all this is a data frame of the input combinations and the output combinations. If you want more factors, add on a sweep3, but beware of the combinatorial complexity...

Related

Using "sample" within mclapply in R not working properly

I'm trying to run multiple iterations of a function using a different subset of of my dataframe each time. In reality the function takes a very long time, so I want to split the iterations across multiple cores using mclapply. For each iteration I'm using sample to randomly select a subset of the dataframe, and this is inside the function I have written to give to mclapply. However, the results of each of the iterations in the output list are identical, suggesting that mclapply is not re-running the sample lines each time. This must be something to do with how I have written the code, any ideas where I have gone wrong?
Here is a reproducible example of a small dataset that runs quickly. You will notice that the 10 iterations in the d.val.all output list are identical, which is not what I am after.
library(bipartite)
library(doBy)
library(parallel)
# create dummy data
ecto.matrix1=data.frame(replicate(10,sample(0:80,81,rep=TRUE)),Species.mix.90=c(sample(c("R","M","S","B"),81,rep=TRUE)))
# set up the function
funct.resample.d <- function(i) {
RedSites <- row.names(ecto.matrix1)[ecto.matrix1$Species.mix.90=="R"]
MountainSites <- row.names(ecto.matrix1)[ecto.matrix1$Species.mix.90=="M"]
randomSilverSites <- sample(row.names(ecto.matrix1)[ecto.matrix1$Species.mix.90=="S"],8,replace=F)
randomBlackSites <- sample(row.names(ecto.matrix1)[ecto.matrix1$Species.mix.90=="B"],8,replace=F)
resampledSites <- c(RedSites,MountainSites,randomSilverSites,randomBlackSites) # make vector of the site names
matrix=ecto.matrix1[resampledSites,] # select only those rows from the resampled row names
matrix1 = matrix[,colSums(matrix[,-c(ncol(matrix))]) > 0] # drop cols that sum to 0
matrix2=summaryBy(matrix1[,-c(ncol(matrix1))]~Species.mix.90,data=matrix1,FUN=sum)
for (col in 1:ncol(matrix2)){
colnames(matrix2)[col] <- sub(".sum", "", colnames(matrix2)[col]) # remove the sum bit from the col names
}
row.names(matrix2)<-matrix2$Species.mix.90 # make row names
matrix2=subset(matrix2, select=-c(Species.mix.90)) # drop host col
d.val <- dfun(matrix2)$dprime
}
# run mclapply
reps=c(1:10)
d.val.all <- mclapply(reps, funct.resample.d, mc.cores = 10)
In case anyone else is having similar issues, I figured out that the problem was with the summaryBy function rather than sample. I replaced summaryBy with aggregate, and the randomization worked fine.
matrix2=aggregate(. ~ Species.mix.90, matrix1, sum)

In R, is it possible to use a pair, tuple or equivalent in a matrix?

I am trying to create a matrix of coordinates(indexes) that I randomly pick one from using the sample function. I then use these to select a cell in another matrix. What is the best way to do this? The trouble is how to store these integers in the matrix so that they are easy to separate. Right now I have them stored as strings with a comma, that I then split. Someone suggested I use a pair, or a string, but I cannot seam to get these to work with a matrix. Thanks!
EDIT:What i currently have looks like this (changed a little to make sense out of context):
probs <- matrix(c(0,0,0.6,0,0,
0,0.7,1,0.7,0,
0.6,1,0,1,0.6,
0,0.7,1,0.7,0,
0,0,0.6,0,0),5,5)
cordsMat <- matrix("",5,5)
for (x in 1:5){
for (y in 1:5){
cordsMat[x,y] = paste(x,y,sep=",")
}
}
cords <- sample(cordsMat,1,,probs)
cordsVec <- unlist(strsplit(cords,split = ","))
cordX <- as.numeric(cordsVec[1])
cordY <- as.numeric(cordsVec[2])
otherMat[cordX,cordY]
It sort of works but i would also be interested for a better way, as this will get repeated a lot.
If you want to set the probabilities it can easily be done by providing it to sample
# creating the matrix
matrix(sample(rep(1:6, 15:20), 25), 5) -> other.mat
# set the probs vec
probs <- c(0,0,0.6,0,0,
0,0.7,1,0.7,0,
0.6,1,0,1,0.6,
0,0.7,1,0.7,0,
0,0,0.6,0,0)
# the coordinates matrix
mat <- as.matrix(expand.grid(1:nrow(other.mat),1:ncol(other.mat)))
# sampling a row randomly
sample(mat, 1, prob=probs) -> rand
# getting the value
other.mat[mat[rand,1], mat[rand,2]]
[1] 6

Using by in parallel in R

I am a noob R programmer. I have written a code that needs to apply a function to a data frame split by factors. The data frame in itself contains about 1 million 324961 observations with 64376 factors in the variable that we use to slice the dataframe.
The code is as follows:
library("readstata13")
# Reading the Stata Data file into R
bod_fb <- read.dta13("BoD_nonmissing_fb.dta")
gen_fuzzy_blau <- function(bod_sample){
# Here we drop the Variables that are not required in creating the Fuzzy-Blau index
bod_sample <- as.data.frame(bod_sample)
bod_sample$tot_occur <- as.numeric(bod_sample$tot_occur)
bod_sample$caste1_occ <- as.numeric(bod_sample$caste1_occ)
bod_sample$caste2_occ <- as.numeric(bod_sample$caste2_occ)
bod_sample$caste3_occ <- as.numeric(bod_sample$caste3_occ)
bod_sample$caste4_occ <- as.numeric(bod_sample$caste4_occ)
# Calculating the Probabilites of a director belonging to a caste
bod_sample$caste1_occ <- (bod_sample$caste1_occ)/(bod_sample$tot_occur)
bod_sample$caste2_occ <- (bod_sample$caste2_occ)/(bod_sample$tot_occur)
bod_sample$caste3_occ <- (bod_sample$caste3_occ)/(bod_sample$tot_occur)
bod_sample$caste4_occ <- (bod_sample$caste4_occ)/(bod_sample$tot_occur)
#Dropping the Total Occurances column, as we do not need it anymore
bod_sample$tot_occur<- NULL
# Here we replace all the blanks with NA
bod_sample <- apply(bod_sample, 2, function(x) gsub("^$|^ $", NA, x))
bod_sample <- as.data.frame(bod_sample)
# Here we push all the NAs in the caste names and caste probabilities to the end of the row
# So if there are only two castes against a name, then they become caste1 and caste2
caste_list<-data.frame(bod_sample$caste1,bod_sample$caste2,bod_sample$caste3,bod_sample$caste4)
caste_list = as.data.frame(t(apply(caste_list,1, function(x) { return(c(x[!is.na(x)],x[is.na(x)]) )} )))
caste_list_prob<-data.frame(bod_sample$caste1_occ,bod_sample$caste2_occ,bod_sample$caste3_occ,bod_sample$caste4_occ)
caste_list_prob = as.data.frame(t(apply(caste_list_prob,1, function(x) { return(c(x[!is.na(x)],x[is.na(x)]) )} )))
# Here we write two functions: 1. gen_castelist
# 2. gen_caste_prob
# gen_castelist: This function takes the row number (serial number of the direcor)
# and returns the names of all the castes for which he has a non-zero
# probability.
# gen_caste_prob: This function takes the row number (serial number of the director)
# and returns the probability with which he belongs to the caste
#
gen_castelist <- function(x){
y <- caste_list[x,]
y <- as.vector(y[!is.na(y)])
return(y)
}
gen_caste_prob <- function(x){
z <- caste_list_prob[x,]
z <- z[!is.na(z)]
z <- as.numeric(z)
return(z)
}
caste_ls <-list()
caste_prob_ls <- list()
for(i in 1:nrow(bod_sample))
{
caste_ls[[i]]<- gen_castelist(i)
caste_prob_ls[[i]]<- gen_caste_prob(i)
}
gridcaste <- expand.grid(caste_ls)
gridcaste <- data.frame(lapply(gridcaste, as.character), stringsAsFactors=FALSE)
gridcasteprob <- expand.grid(caste_prob_ls)
# Generating the Joint Probability
gridcasteprob$JP <- apply(gridcasteprob,1,prod)
# Generating the Similarity Index
gen_sim_index <- function(x){
x <- t(x)
a <- as.data.frame(table(x))
sim_index <- sum(a$Freq^2)/(sum(a$Freq))^2
return(sim_index)
}
gridcaste$sim_index <- apply(gridcaste,1,gen_sim_index)
# Generating fuzzyblau
gridcaste$fb <- gridcaste$sim_index * gridcasteprob$JP
fuzzy_blau_index <- sum(gridcaste$fb)
remove_list <- c("gridcaste","")
return(fuzzy_blau_index)
}
fuzzy_blau_output <- by(bod_fb,bod_fb$code_year,gen_fuzzy_blau)
# Saving the output as a dataframe with two columns
# Column 1 is the fuzzy blau index
# Column 2 is the code_year
code_year <- names(fuzzy_blau_output)
fuzzy_blau <- as.data.frame(as.vector(unlist(fuzzy_blau_output)))
names(fuzzy_blau) <- c("fuzzy_blau_index")
fuzzy_blau$code_year <- code_year
bod_fb <- merge(bod_fb,fuzzy_blau,by = "code_year")
save.dta13(bod_fb,"bod_fb_example.dta")
If the code is tl;dr, the summary is as follows:
I have a dataframe bod_fb. I need to apply the apply the gen_fuzzy_blau function on this dataframe by slicing the dataframe with factors of bod_fb$code_year.
Since the function is very huge sequential processing is taking more than a day and ends up in running out of memory. The function gen_fuzzy_blau returns a numeric variable fuzzy_blau_index for each code_year of the dataframe. I use by to apply the function on each slice. I wanted to know if there is a way to parallelly implement this code so that multiple instances of the function run at once on different slices of the dataframe. I did not find a by implementation for parallel package and I did not know how to pass the dataframes as iterators while using foreach and doParallel packages.
I have a AMD A8 laptop with 4GB RAM and windows 7 sp1 home basic. I have given 20GB as page file memory (this was after I got the memory error).
Thank you
EDIT 1: #milkmotel I have eliminated the redundancy in the code and removed the for loops, but a huge amount of time is being wasted in gen_sim_index in the function, I am using the proc.time()function to gauge the time that each part of the code is taking.
The function is supposed to the following to a row:
if we have a row (not a vector) say: a a b c the similarity index will be (2/4)^2 + (1/4)^2 + (1/4)^2 ie, summation of (no of occurences of each unique element of each row/total no of elements in the row)^2
I am unable to use the apply function directly on the row because each element in a row because each element in the row has different factors and table() does not output the frequencies properly.
What is an efficient way to code the gen_sim_index function?
You're saving your data 6 times over in 6 different variables. Try not doing that.
and it takes a day because you're running character indexing on a ridiculous amount of data with gsub().
Take your code out of your gen_fuzzy_blau function as it provides no value to wrap it up into one function rather than running it all independently. Then run it all one line at a time. If it takes too long to run, reconsider your method. Your code is incredibly inefficient.

How to compute summation in r

My R code:
((x[1]-xm)^2)+((x[2]-xm)^2)+((x[3]-xm)^2)+((x[4]-xm)^2)+((x[5]-xm)^2)+((x[6]-xm)^2)
This computation would be much easier if i formulated the problem as a summation. How do I do that in r? Something like:
sum((x[i]-xm)^2) for i=1 to i=6?
x is a data frame.
Without reading all the responses in this thread, there is a really easy way to do summations in R.
Modify the following two lines as needed to accommodate a matrix or other type of vector:
i <- 0:5; sum(i^2)
Use i for your index when accessing a position in your vector/array.
Note that i can be any vector.
You need to use sum(), example below:
IndexStart <- 1
x <- seq(IndexStart, 6, 1)
xm <- 1
result1 <- ((x[1]-xm)^2)+((x[2]-xm)^2)+((x[3]-xm)^2)+((x[4]-xm)^2)+((x[5]-xm)^2)+((x[6]-xm)^2)
print(result1)
# [1] 55
result2 <- sum((x-xm)^2) # <- Solution
print(result2)
# [1] 55

How to create a loop for generate a list of random samples in R?

I'm trying to create a loop that creates a series of objects that contains a random sample, like this:
sample <- ceiling(runif(9, min=0, max=20))
(This is an example for a rounded uniform, but it can be replaced by a normal, poisson or whatever you want).
So, I built a loop for generate automatically various of those generators, with the objective of include them in a data frame. Then, the loop I designed was this:
N=50
dep=as.vector(N)
count=1
for (i in 1:N){
dep[count] <- ceiling(runif(9, min=0, max=20))
count=count+1
}
But it didn't work! For each dep[i] I have only a number, not a list of nine.
How I should do it? And if I want to include every dep[i] in a data frame?
Thanks so much, I hope you understand what i want.
It's because you've made dep a vector (these are 1D by default), but you're trying to store a 2-dimensional object in it.
You can dep off as NULL and rbind (row-bind) to it in the loop.Also, note that instead of using count in your loop you can just use i:
dep <- NULL
for (i in 1:N){
dep <- rbind(dep, ceiling(runif(9, min=0, max=20)))
}
# if you look at dep now it's a 2D matrix.
# We'll convert to data frame
dep <- as.data.frame(dep)
However, there's a simpler way to do this. You don't have to generate dep row-by-row, you can generate it up front, by making a vector containing 9*N of your rounded uniform distribution numbers:
dep <- ceiling(runif(9*N,min=0,max=20))
Now, dep is currently a vector of length 9*N. Let's make it into a Nx9 matrix:
dep <- matrix(dep,nrow=N)
Done!
So you can do all your code above in one line:
dep <- matrix( ceiling(runif(9*N,min=0,max=20)), nrow=N )
If you want you can call data.frame on dep (after it's been put into its 2D matrix form) to get a data frame.
As #mathematical.coffee explained. But also, it seems in your case for runif, you can use sample instead. And actually sample.int is more reliable. ...And about 3x faster than using runif here):
N <- 1000000
system.time( dep <- matrix(sample.int(20, 9*N, replace=TRUE), N) ) # 0.16 secs
range(dep) # 1 20
system.time( dep <- matrix(ceiling(runif(9*N, min=0, max=20)), N) ) # 0.45 secs
range(dep) # 1 20

Resources