Programming state switches in R - r

I am trying to write a program that sets a state from A to state B and vice versa.
rnumbers <- data.frame(replicate(5,runif(2000, 0, 1)))
I am imagining this data frame of random numbers in a uniform distribution, except it has 10000 rows instead of 20 rows.
Setting the probability of going to state A and state B :
dt <- c(.02)
A <- dt*1
B <- dt*.5
Making a function that goes through data frame rnumbers and putting in a 0 if the number is less than B and a 1 if the number is less than A.
step_generator <- function(x){
step <- ifelse ( x < B, 0, ifelse(x < A, 1, NA))
return(step)
}
state <- apply(rnumbers, 2, step_generator)
This essentially gives me what I want - a data frame with columns that contain 0, 1, or NA depending on the value of the random number in rnumbers. However, I am missing a couple of things--
1) I would like to keep track of how long each state lasts. What I mean by that, is if you imagine each row as a change in time as above (dt <- c(.02)). I want to be able to plot "state vs. time". In order to address this, this is what I tried :
state1 <- transform(state, time = rep(dt))
state2 <- transform(state1, cumtime = cumsum(time))
This gets me close to what I want, cumtime goes from .02 to .4. However, I want the clock to start at 0 in the 1st row and add .02 to every subsequent row.
2) I need to know how long each state lasts for. Essentially, I want to be able to go through each column, and ask for how much time (cumsum) does each state last. This would then give me a distribution of times for state A and state B. I want this stored in another data frame.
I think this makes sense, if anything is unclear please let me know and I will clarify.
Thanks for any and all help!

The range between "number is less than .02*1 and greater than .02*.5" is very narrow so if you are setting this simulation up, most of the first row will most probably be zero. You cannot really hope to get success with ifelse when the conditions have any look-back features. That function doesn't allow "back-indexing".
rstate <- rnumbers # copy the structure
rstate[] <- NA # preserve structure with NA's
# Init:
rstate[1, ] <- rnumbers[1, ] < .02 & rnumbers[1, ] > 0.01
step_generator <- function(col, rnum){
for (i in 2:length(col) ){
if( rnum[i] < B) { col[i] <- 0 }
else { if (rnum[i] < A) {col[i] <- 1 }
else {col[i] <- col[i-1] } }
}
return(col)
}
# Run for each column index:
for(cl in 1:5){ rstate[ , cl] <-
step_generator(rstate[,cl], rnumbers[,cl]) }

Related

Running random operations over vector, conditional

I am doing some modelling and wish to simulate randomness.
I have a total number of runs run_times which is 5 in this example.
A vector holding run_lengths will print 1's for which, so if run length is 3, it prints 1's 3 times.
The sample_data includes a sample of 1's and 0's. The application of printing 1's along a run_lengths is randomly done when sample_data == 1; not all == 1 is to be picked though. Only random... and operation can only print 1 for a total number of run_times (5).
Theres a few moving parts for sure.
I am tackling the problem in this manner:
I am able to select run_lengths at random with sample(run_lengths, 1). I am unsure how to select sample_data at random and I'm trying to keep a counter in order to stay under run_times:
run_lengths <- c(2,4,5,6,7,8,1)
run_times <- 5
sample_data <- c(0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0)
# Randomly select 1's from sample_data, when find 1, randomly print 1's along run_lengths
# Only print a certain amount of times (run_times)
# Pick run_lengths at random == sample(run_lengths,1)
# Pick df$sample 1's at random, how to randomly select????
count <- 0 # keep track of how many random run_lengths is being applied
res <- NULL
while (length(res) < length(sample_data)) {
if (sample_data[length(res)+1] == 1 & count < run_times) { # not sure how to pick sample_date == 1?
res <- c(res, rep(1,sample(run_lengths,1))) # if signal == 1 (randomly) then randomly rep a run_length
count <- count +1 # count how many random reps, run_lengths have been applied
} else {
res <- c(res, 0) # Note if condition is not true, we print 0 vs 1
}
}
res <- res[1:length(sample_data)]
res
I have completed it maybe on 60%? I'm not sure what is the best approach for choosing random 1's from sample_data. Also I'm not sure how to only keep number of run_lengths under the run_times maximum. I am attempting to keep a count for when the condition was true. If it was exceeded, it would ignore any other true conditions.
Ok, time to put down some code, still not sure about if it's right or not
sample_data <- c(0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0)
# take indices of of sampled data where value == 1
i <- which(sample_data %in% 1)
# now shuffle them all, no replacement - random positions with 1s
p <- sample(i, length(i), replace=FALSE)
print(sample_data[p[1]])
print(sample_data[p[2]])
print(sample_data[p[3]])
...
Is this what you want?

Trouble coding a number of matrix models to run simultaneously

I made a matrix based population model, however, I would like to run more than one simultaneously in order to represent different groups of animals, in order that dispersing individuals can move between matrices. I originally just repeated everything to get a second matrix but then I realised that because I run the model using a for loop and break() under certain conditions (when that specific matrix should stop running, ie that group has died out) it is, understandably, stopping the whole model rather than just that singular matrix.
I was wondering if anyone had any suggestions on the best ways to code the model so that instead of breaking, and stopping the whole for loop, it just stops running across that specific matrix. I'm a little stumped. I have include a single run of one matrix below.
Also if anyone has a more efficient way of creating and running 9 matrices than writing everything out 9 times advice much appreciated.
n.steps <- 100
mats <- array(0,c(85,85,n.steps))
ns <- array(0,c(85,n.steps))
ns[1,1]<-0
ns[12,1]<-rpois(1,3)
ns[24,1]<-rpois(1,3)
ns[85,1] <- 1
birth<-4
nextbreed<-12
for (i in 2:n.steps){
# set up an empty matrix;
mat <- matrix(0,nrow=85,ncol=85)
surv.age.1 <- 0.95
x <- 2:10
diag(mat[x,(x-1)]) <- surv.age.1
surv.age.a <- 0.97
disp <- 1:74
disp <- disp*-0.001
disp1<-0.13
disp<-1-(disp+disp1)
survdisp<-surv.age.a*disp
x <- 11:84
diag(mat[x,(x-1)])<-survdisp
if (i == nextbreed) {
pb <- 1
} else {
pb <- 0
}
if (pb == 1) {
(nextbreed <- nextbreed+12)
}
mat[1,85] <- pb*birth
mat[85,85]<-1
death<-sample(c(replicate(1000,
sample(c(1,0), prob=c(0.985, 1-0.985), size = 1))),1)
if (death == 0) {
break()}
mats[,,i]<- mat
ns[,i] <- mat%*%ns[,i-1]
}
group.size <- apply(ns[1:85,],2,sum)
plot(group.size)
View(mat)
View(ns)
As somebody else suggested on Twitter, one solution might be to simply turn the matrix into all 0s whenever death happens. It looks to me like death is the probability that a local population disappears? It which case it seems to make good biological sense to just turn the entire population matrix into 0s.
A few other small changes: I made a list of replicate simulations so I could summarize them easily.
If I understand correctly,
death<-sample(c(replicate(1000,sample(c(1,0), prob=c(0.985, 1-0.985), size =1))),1)
says " a local population dies completely with probability 1.5% ". In which case, I think you could replace it with rbinom(). I did that below and my plots look similar to those I made with your code.
Hope that helps!
lots <- replicate(100, simplify = FALSE, expr = {
for (i in 2:n.steps){
# set up an empty matrix;
mat <- matrix(0,nrow=85,ncol=85)
surv.age.1 <- 0.95
x <- 2:10
diag(mat[x,(x-1)]) <- surv.age.1
surv.age.a <- 0.97
disp <- 1:74
disp <- disp*-0.001
disp1<-0.13
disp<-1-(disp+disp1)
survdisp<-surv.age.a*disp
x <- 11:84
diag(mat[x,(x-1)])<-survdisp
if (i == nextbreed) {
pb <- 1
} else {
pb <- 0
}
if (pb == 1) {
(nextbreed <- nextbreed+12)
}
mat[1,85] <- pb*birth
mat[85,85]<-1
death<-rbinom(1, size = 1, prob = 0.6)
if (death == 0) {
mat <- 0
}
mats[,,i]<- mat
ns[,i] <- mat%*%ns[,i-1]
}
ns
})
lapply(lots, FUN = function(x) apply(x[1:85,],2,sum))

Reducing row reference by 1 for each for loop iteration in R

I'm working on a formula in R, that iterates over a data frame in reverse. Right now, the formula will take a set number of columns, and find the mean for each column, up to a set row number. What I'd like to do is have the row number decrease by 1 for each iteration of the for loop. The goal here is to create a "triangular" reference that uses one less value for the column means, per iteration.
Here's some code you can use to create sample data that works in the formula.
test = data.frame(p1 = c(1,2,0,1,0,2,0,1,0,0), p2 = c(0,0,1,2,0,1,2,1,0,1))
Here's the function I'm working with. My best guess is that I'll need to add some sort of reference to i in the mean(data[1:row, i]) section, but I can't seem to work the logic/math out on my own.
averagePickup = function(data, day, periods) {
# data will be your Pickup Data
# day is the day you're forecasting for (think row number)
# periods is the period or range of periods that you need to average (a column or range of columns).
pStart = ncol(data)
pEnd = ncol(data) - (periods-1)
row = (day-1)
new_frame <- as.data.frame(matrix(nrow = 1, ncol = periods))
for(i in pStart:pEnd) {
new_frame[1,1+abs(ncol(data)-i)] <- mean(data[1:row , i])
}
return(sum(new_frame[1,1:ncol(new_frame)]))
}
Right now, inputing averagePickup(test,5,2) will yield a result of 1.75. This is the sum of the means for the first 4 values of the two columns. What I'd like the result to be is 1.33333. This would be the sum of the mean of the first 4 values in column p1, and the mean of the first 3 values in column p2.
Please let me know if you need any further clarification, I'm still a total scrub at R!!!
Like this?
test = data.frame(p1 = c(1,2,0,1,0,2,0,1,0,0), p2 = c(0,0,1,2,0,1,2,1,0,1))
averagePickup = function(data, first, second) {
return(mean(test[1:first,1]) + mean(test[1:second,2]))
}
averagePickup(test,4,3)
This gives you 1.333333
Welp, I ended up figuring it out with a few more head bashes against the wall. Here's what worked for me:
averagePickup = function(data, day, periods) {
# data will be your Pickup Data
# day is the day you're forecasting for (think row number)
# periods is the period or range of periods that you need to average (a column or range of columns).
pStart = ncol(data)
pEnd = ncol(data) - (periods-1)
row = (day-1)
new_frame <- as.data.frame(matrix(nrow = 1, ncol = periods))
q <- 0 # Instantiated a q value. Run 0 will be the first one.
for(i in pStart:pEnd) {
new_frame[1,1+abs(ncol(data)-i)] <- mean(data[1:(day - periods + q) , i]) # Added a subtraction of q from the row number to use.
q <- q + 1 # Incrementing q, so the next time will use one less row.
}
return(sum(new_frame[1,1:ncol(new_frame)]))
}

For a growing data feed in R, how can two time lengths be calculated based on "time to peak" and "time back to baseline"?

How can the following be accomplished with R?
Connect a constantly changing data source (e.g. https://goo.gl/XCM6yG) into R,
Measure time once prices start to rise consistently from initial baseline range to peak (represented by the green horizontal line),
Measure time from peak back to baseline range (the teal line)
Note: "Departure from baseline range" (unless there is a better mathematical way) defined as at least the most recent 5 prices all being over 3 standard deviations above the mean of the latest 200 prices
This is a really vague questions with an unknown use case but... here we go.
Monitoring in what way? The length? That's what I did
The vector has over 200 values we can take the mean, so we need a control flow for that part.
I added in some noise which basically says force the behavior you want to calculate ( ifelse(i %in% 996:1000, 100, 0) which means, if the iterator is in 996 to 1000, add 100 to the random normal i generated). We set a counter and check if each value is about 3 sd of the vector values, if so we record the time.
At each input of the data...check if the current value is the max value... now this is more tricky since we would have to look at the trend. This is beyond the scope of my assistance.
Up to you to figure out since I don't really understand
vec <- vecmean <- val5 <- c()
counter <- 0
for(i in 1:1000){
vec[i] <- rnorm(1) + ifelse(i %in% 996:1000, 100, 0)
Sys.sleep(.001) # change to 1 second
#1
cat('The vector has',length(vec),'values within...\n')
#2
if(length(vec)>200){
vecmean <- c(vecmean, mean(vec[(i-200):i]))
cat('The mean of the last 200 observations is ',
format(vecmean[length(vecmean)], digits =2),'\n')
#3
upr <- vecmean[length(vecmean)] + 3*sd(vec)
if(vec[i] > upr){
counter <- counter + 1
} else{
counter <- 0
}
if(counter > 4){
cat('Last 5 values greater than 3sd aboving the rolling mean!\n')
val5 <- Sys.time()
cat("Timestamp:",as.character(val5),'\n')
}
}
# 4
theMax <- max(vec)
if(vec[i] == theMax & !is.null(val5) ){
valMax <- Sys.time()
valDiff <- valMax - val5
cat('The time difference between the first flag and second is', as.character(valDiff),'\n')
}
}

improve a for loop with apply inside

I have a data.frame, ordered by mean column that looks like this:
10SE191_2 10SE207 10SE208 mean
7995783 12.64874 13.06391 12.69378 12.73937
8115327 12.69979 12.52285 12.41582 12.50363
8108370 12.58685 12.87818 12.66021 12.45720
7945680 12.46392 12.26087 11.77040 12.36518
7923547 11.98463 11.96649 12.50666 12.33138
8016718 12.81610 12.71548 12.48164 12.32703
I would like to apply a t.test to each row, using as input the intensity values: df[i,1:3] and the mean values from the rows with lower intensities. For example, for the first row I want to compute a t.test for df[1,1:3] vs _mean values_ from row 2 to row 6. My code uses a for loop but my current data.frame has more than 20,000 rows and 24 columns and it takes a long time. Any ideas for improving the code?
Thanks
Code:
temp <- matrix(-9, nrow=dim(matrix.order)[1], ncol=2) #create a result matrix
l <- dim(matrix.order)[1]
for (i in 1:l){
j <- 1+i
if (i < l | j +2 == l) { #avoid not enough y observations
mean.val <- matrix.order[j:l,4]
p <- t.test(matrix.order[i, 1:3], mean.val)
temp[i,1] <- p$p.value
}
else {temp[i,1] <- 1}
}
dput for my df
structure(list(`10SE191_2` = c(12.6487418898415, 12.6997932097351,12.5868508174491, 12.4639169398277, 11.9846348627906, 12.8160978540904), `10SE207` = c(13.0639063105224, 12.522848114011, 12.8781769160682, 12.260865493177, 11.9664905651469, 12.7154788700468), `10SE208` = c(12.6937808736673, 12.4158248856386, 12.6602128982717, 11.7704045448312, 12.5066604109231, 12.4816357798965), mean = c(12.7393707471856, 12.5036313008127, 12.4572035036992, 12.3651842840775, 12.3313821056582, 12.3270331271091)), .Names = c("10SE191_2", "10SE207", "10SE208", "mean"), row.names = c("7995783", "8115327", "8108370", "7945680", "7923547", "8016718"), class = "data.frame")
You can obtain all p-values (if possible) with this command:
apply(df, 1, function(x) {
y <- df$mean[df$mean < x[4]]
if(length(y) > 1)
t.test(x[1:3], y)$p.value
else NA
})
The function will return NA if there are not enough values for y.
7995783 8115327 8108370 7945680 7923547 8016718
0.08199794 0.15627947 0.04993244 0.50885253 NA NA
Running 2E4 t.tests probably takes a lot of time no matter what. Try using Rprof to find the hot spots. You might also want to use mcapply or similar parallel processing tools, since your analysis of each row is independent of all other data (which means this is a task well-suited to multicore parallel processing).

Resources