R : Display plot while calculating [duplicate] - r

This question already has answers here:
Plotting during a loop in RStudio
(7 answers)
Closed 8 years ago.
I wrote an R script which does a lot of heavy calculation. The script enters a while loop and proceeds it around 16 times (which lasts between 3 and 5 minutes a loop in average).
I would like to be able to "track" the calculation progresses: be able to see how much has been done and how much remains to be done. So I just printed the number of the iteration and the calculation time for a finished loop.
What I would like to add is a kind of a "progression bar", which I represent by a ggplot2 histogram. Every time a calculation is finished, I update the plot (the "finished" area grows bigger and the "remaining" area subsequently diminishes), and I print it. But, no plot shows in the "plot" area of R studio while the function is still executing. All plots appears once the overall calculation (of the 16 loops) is finished.
So my question is: Is it possible to display a plot at the end of a while loop, even if the "overall" calculation is not over?
If I am not being clear, please tell me.
Thank you in advance for your answers.
EDIT: Sys.sleep() works very well for what I intended.
If anyone is interested in the code, here follows:
# Big calculation
iter <- iter + 1
d <- data.frame(x1 = c(1,1), x2 = c(iter/maxIt, 1 - iter/maxIt),
x3 = c("finished", "remaining"))
print(qplot(d$x1, d$x2, fill = factor(d$x3), geom = "histogram", stat = "identity"))
Sys.sleep(1)
# Preparing the next loop

timearray <- NULL # array of time used by each iteration
for (wait in c(3,2,5,4,6)) {
s <- Sys.time() # iteration time start
Sys.sleep(3) # do your calculation -- I just wait here
s <- Sys.time() - s # now s is last iteration time
timearray <- c(timearray, s) # add this new value to array
plot(timearray, type='h', main='Iteration time') # plot the array
}
Could it be something like that?

Related

R - Finding change points in a dataset [duplicate]

This question already has an answer here:
How to find changing points in a dataset
(1 answer)
Closed 5 years ago.
enter image description here
enter image description here
Kindly request to refer to the images to get a complete understanding
I have a huge dataset with numeric values. I would need to find the points at which an increasing or decreasing trend starts and ends.
E.g:
[100312
100317
100380
100432
100438
100441
100509
100641
100779
100919
100983
100980
100978
100983
100986
100885
100767
100758
100755
100755]
I have shown 5000 of the 1 million rows I have in my data.
Output > 100317(starting point of increase),100432 (end point of increase), 100441 (starting point of increase) 100919(end point of increase).
A change of ~10 is considered as noise.
you can try this code; starting point of increase and ending point of increase
df <- c(100312,100317,100380,100432,100438,100441,100509,100641,100779,100919,100983,100980,100978,100989,100999,100885,100767,100758,100755,100755)
indexs <- which(diff(df) >= 10)
flag <- which(diff(indexs) > 1)
end <- c(flag, length(indexs))
start <- c(1, end[-length(end)] + 1)
mapply(function(x, y) c(df[indexs[x]], df[indexs[y]]), start, end)

Forming a Wright-Fisher loop with "sample()"

I am trying to create a simple loop to generate a Wright-Fisher simulation of genetic drift with the sample() function (I'm actually not dead-set on using this function, but, in my naivety, it seems like the right way to go). I know that sample() randomly selects values from a vector based on certain probabilities. My goal is to create a system that will keep running making random selections from successive sets. For example, if it takes some original set of values and samples a second set, I'd like the loop to take another random sample from the second set (using the probabilities that were defined earlier).
I'd like to just learn how to do this in a very general way. Therefore, the specific probabilities and elements are arbitrary at this point. The only things that matter are (1) that every element can be repeated and (2) the size of the set must stay constant across generations, per Wright-Fisher. For an example, I've been playing with the following:
V <- c(1,1,2,2,2,2)
sample(V, size=6, replace=TRUE, prob=c(1,1,1,1,1,1))
Regrettably, my issue is that I don't have any code to share yet precisely because I'm not sure of how to start writing this kind of loop. I know that for() loops are used to repeat a function multiple times, so my guess is to start there. However, from what I've researched about these, it seems that you have to start with a variable (typically i). I don't have any variables in this sampling that seem explicitly obvious; which isn't to say one couldn't be made up.
If you wanted to repeatedly sample from a population with replacement for a total of iter iterations, you could use a for loop:
set.seed(144) # For reproducibility
population <- init.population
for (iter in seq_len(iter)) {
population <- sample(population, replace=TRUE)
}
population
# [1] 1 1 1 1 1 1
Data:
init.population <- c(1, 1, 2, 2, 2, 2)
iter <- 100

In RNetLogo how do I export a list of agentset details without overwriting the list each time?

I have a model in NetLogo where agents (turtles) move around a landscape and produce other agents (eggs) at a set rate. The latter don't move. My aim is to collect the the coordinates of eggs and measure things like nearest neighbour distance.
In RNetLogo I have some code to do this:
NLCommand("setup")
NLDoCommandWhile("day < 10", "go")
eggcoords <- list()
eggcoords <- NLGetAgentSet(c("who","xcor","ycor"), "eggs")
The problem is that the model slows to a crawl as the number of eggs increases. One solution to this is to kill off the eggs at the end of the day but store their details in a list that I can update each day without overwriting anything. And that's where I'm stuck.
Hope you can help.
If your day is constant number of ticks (say 24), you could do something like this:
Create end-of-day procedure in your NetLogo model where you clear the eggs.
Then call your model like this:
turtles <- list()
NLCommand("setup")
# run for 10 days:
for (day in 1:10) {
NLCommand("repeat 24 [go]")
agent_set <- NLGetAgentSet(c("who", "xcor", "ycor",
"min [ distance myself ] of other turtles"),
"turtles")
names(agent_set) <- c("who", "xcor", "ycor", "nnd")
agent_set$day = day
turtles[[day]] <- agent_set
NLCommand("end-of-day")
}
Note:
In the for loop go is executed 24 times with one NLCommand call
You can use any NetLogo reporter in agent.var argument to NLGetAgentSet so you can calculate the nearest neighbour distance on the fly
with min [ distance myself ] of other turtles.
The result (turtles) is a list of data frames.
You can bind them to one data frame with df_turtles <- do.call(rbind, turtles)

optimizing markov chain transition matrix calculations?

As an intermediate R user, I know that for loops can very often be optimized by using functions like apply or otherwise. However, I am not aware of functions that can optimize my current code to generate a markov chain matrix, which is running quite slowly. Have I max-ed out on speed or are there things that I am overlooking? I am trying to find the transition matrix for a Markov chain by counting the number of occurrences in 24-hour time periods before given alerts. The vector ids contains all possible id's (about 1700).
The original matrix looks like this, as an example:
>matrix
id time
1 1376084071
1 1376084937
1 1376023439
2 1376084320
2 1372983476
3 1374789234
3 1370234809
And here is my code to try to handle this:
matrixtimesort <- matrix[order(-matrix$time),]
frequency = 86400 #number of seconds in 1 day
# Initialize matrix that will contain probabilities
transprobs <- matrix(data=0, nrow=length(ids), ncol=length(ids))
# Loop through each type of event
for (i in 1:length(ids)){
localmatrix <- matrix[matrix$id==ids[i],]
# Loop through each row of the event
for(j in 1:nrow(localmatrix)) {
localtime <- localmatrix[j,]$time
# Find top and bottom row number defining the 1-day window
indices <- which(matrixtimesort$time < localtime & matrixtimesort$time >= (localtime - frequency))
# Find IDs that occur within the 1-day window
positiveids <- unique(matrixtimesort[c(min(indices):max(indices)),]$id)
# Add one to each cell in the matrix that corresponds to the occurrence of an event
for (l in 1:length(positiveids)){
k <- which(ids==positiveids[l])
transprobs[i,k] <- transprobs[i,k] + 1
}
}
# Divide each row by total number of occurrences to determine probabilities
transprobs[i,] <- transprobs[i,]/nrow(localmatrix)
}
# Normalize rows so that row sums are equal to 1
normalized <- transprobs/rowSums(transprobs)
Can anyone make any suggestions to optimize this for speed?
Using nested loops seems a bad idea. Your code can be vectorized to speed up.
For example, why find the top and bottom of row numbers? You can simply compare the time value with "time_0 + frequency": it is a vectorized operation.
HTH.

Simulate coin toss for one week?

This is not homework. I am interested in setting up a simulation of a coin toss in R. I would like to run the simulation for a week. Is there a function in R that will allow me to start and stop the simulation over a time period such as a week? If all goes well, I may want to increase the length of the simulation period.
For example:
x <- rbinom(10, 1, 1/2)
So to clarify, instead of 10 in the code above, how do I keep the simulation going for a week (number of trials in a week versus set number of trials)? Thanks.
Here is code that will continue to run for three seconds, then stop and print the totals.
x <- Sys.time()
duration <- 3 # number of seconds
heads <- 0
tails <- 0
while(Sys.time() <= x + duration){
s <- sample(0:1, 1)
if(s == 1) heads <- heads+1 else tails <- tails+1
cat(sample(0:1, 1))
}
cat("heads: ", heads)
cat("tails: ", tails)
The results:
001100111000011010000010110111111001011110100110001101101010 ...
heads: 12713
tails: 12836
Note of warning:
At the speed of my machine, I bet that you get a floating point error long before the end of the week. In other words, you may get to the maximum value your machine allows you to store as an integer, double, float or whatever you are using, and then your code will crash.
So you may have to build in some error checking or rollover mechanism to protect you from this.
For an accelerated illustration of what will happen, try the following:
x <- 1e300
while(is.finite(x)){
x <- x+x
cat(x, "\n")
}
R deals with the floating point overload gracefully, and returns Inf.
So, whatever data you had in the simulation is now lost. It's not possible to analyse infinity to any sensible degree.
Keep this in mind when you design your simulation.
While now is smaller than a week later time stamp append to x rbinmo(1,1,1/2)
R> week_later <- strptime("2012-06-22 16:45:00", "%Y-%m-%d %H:%M:%S")
R> x <- rbinom(1, 1, 1/2) // init x
R> while(as.numeric(Sys.time()) < as.numeric(week_later)){
R> x <- append(x, rbinom(1, 1, 1/2))
R> }
You may be interested in the fairly new package harvestr by Andrew Redd. It splits a task into pieces (the idea being that pieces could be run in parallel). The part of the package that applies to your question is that it caches results of the pieces that have already been processed, so that if the task is interupted and restarted then those pieces that have finished will not be rerun, but it will pick up on those that did not complete (pieces that were interupted part way through will start from the beginning of that piece).
This may let you start and stop the simulation as you request.

Resources