I just want to say first that I'm pretty new to R coding. I wrote up some R code which will run over thousands of iterations. The code works and gets the results that I need, however it takes way too long to run. I'll first explain what the code is doing and then the code it self. How can I make this more efficient and make it run in a relatively short time over 200K+ iterations?
There is a while loop which runs until the total dollars reach the target dollars. First I generate a random number, which I look up on the Prob column in the first table below which returns the Dist column (this value is stored as a string). I parse the string and get a value based on the distribution and add it to a vector. Then I use this value to do a another look up on the second table below and get a factor and save these factors for each value in a second vector. I do this loop until I reach my target dollars. Then I multiple the two vectors to get my result vector. This while loop is then looped 200K+ times.
Prob Range Dist
.12 5000 rgamma(1, 3, , 900) + 1000
.70 100000 rgamma(1, 1, , 900) + 5000
.85 350000 rgamma(1,0.9, , 150000) + 200000
.95 1500000 rgamma(1,0.8, , 230000) + 200000
1.0 2500000 runif(1, 1500000, 2500000)
Range Factor
5000 rweibull(1, 20, 1.1)
100000 rweibull(1, 30, 1.2)
250000 rweibull(1, 25, 1.5)
2500000 rweibull(1, 25, 1.8)
Sample code is below. I've used dummy values in many places, there is other operations having a couple more similar operations as below. Ruing this 100 times takes about a minute. When I run it thousands of time, it will take too long. How can I make this code more efficient?
t <- proc.time()
sims <- 100
totalD <- 0
totalRev <- c(150000000)
i <- 0
ProbRnge <- matrix(c(0.12, 0.70, 0.85, 0.95, 1,
5000, 100000, 350000, 1500000, 2500000,
1000, 5000, 100000, 350000, 1500000), ncol=3)
Dis1 <- c("rgamma(1, 3.0268, , 931.44) + 1000", "rgamma(1, 1.0664, , 931.44) + 5000",
"rgamma(1, 1.0664, , 931.44) + 5000", "rgamma(1, 1.0664, , 931.44) + 5000",
"runif(1, 1250000, 2000000)")
SizeRnge <- c(5000, 100000, 250000, 2500000)
Dis2 <- c("rweibull(1, 20, 1.1)", "rweibull(1, 30, 1.2)", "rweibull(1, 25, 1.5)",
"rweibull(1, 25, 1.8)")
#simulation loop
for (j in 1:sims) {
TotalDTemp <- NULL
FacTmp <- NULL
TotalDTemp <- vector()
FacTmp <- vector()
# loop while total simulated reached target total.
while(totalD < totalRev[1])
i = i + 1
#find where random number falls in range and look up distribution and calculate value and store in vector
row_i <- which.max(ProbRnge[,1] > runif(1))
tmpSize <- max(min(eval(parse(text=Dis1[row_i])), ProbRnge[row_i, 2]), ProbRnge[row_i, 3])
if (totalD + tmpSize > totalRev[1]) {
tmpSize = totalRev[1] - totalD
totalD = totalD + tmpSize
} else {
totalD = totalD + tmpSize }
TotalDTemp [i] <-tmpSize
# take value an lookup up factor to apply and store in vector
row_i <- which.max(SizeRnge > tmpSize)
tempRTR <- max(min(eval(parse(text=Dis2[row_i])), 2), 1)
FacTmp [i] <- tempRTR
DfacTotal <- TotalDTemp * FacTmp
totalD = 0
i = 0
proc.time() - t
If you profile your code, you see that what is taking the most of time is parsing the expressions. You could do that beforehand (before the loops) by computing
expr1 <- lapply(Dis1, function(text) parse(text = text))
expr2 <- lapply(Dis2, function(text) parse(text = text))
And then using eval(expr1[[row_i]]) instead of eval(parse(text=Dis1[row_i])).
For me, this reduces computation time from 45 sec to less than 2 sec.
Recently, I learned how to write a loop that initializes some number, and then randomly generates numbers until the initial number is guessed (while recording the number of guesses it took) such that no number will be guessed twice:
# https://stackoverflow.com/questions/73216517/making-sure-a-number-isnt-guessed-twice
all_games <- vector("list", 100)
for (i in 1:100){
guess_i = 0
correct_i = sample(1:100, 1)
guess_sets <- 1:100 ## initialize a set
trial_index <- 1
while(guess_i != correct_i){
guess_i = sample(guess_sets, 1) ## sample from this set
guess_sets <- setdiff(guess_sets, guess_i) ## remove it from the set
trial_index <- trial_index + 1
## no need to store `i` and `guess_i` (as same as `correct_i`), right?
game_results_i <- data.frame(i, trial_index, guess_i, correct_i)
all_games[[i]] <- game_results_i
all_games <- do.call("rbind", all_games)
I am now trying to modify the above code to create the following two loops:
(Deterministic) Loop 1 will always guess the midpoint (round up) and told if their guess is smaller or bigger than the correct number. They will then re-take the midpoint (e.g. their guess and the floor/ceiling) until they reach the correct number.
(Semi-Deterministic) Loop 2 first makes a random guess and is told if their guess is bigger or smaller than the number. They then divide the difference by half and makes their next guess randomly in a smaller range. They repeat this process many times until they reach the correct number.
I tried to write a sketch of the code:
#Loop 2:
correct = sample(1:100, 1)
guess_1 = sample(1:100, 1)
guess_2 = ifelse(guess_1 > correct, sample(50:guess_1, 1), sample(guess_1:100, 1))
guess_3 = ifelse(guess_2 > correct, sample(50:guess_2, 1), sample(guess_2:100, 1))
guess_4 = ifelse(guess_4 > correct, sample(50:guess_3, 1), sample(guess_3:100, 1))
But I am not sure if I am doing this correctly.
Can someone please help me with this?
Thank you!
Example : Suppose I pick the number 68
Loop 1: first random guess = 51, (100-51)/2 + 51 = 75, (75-50)/2 + 50 = 63, (75 - 63)/2 + 63 = 69, (69 - 63)/2 + 63 = 66, etc.
Loop 2: first random guess = 53, rand_between(53,100) = 71, rand_between(51,71) = 65, rand(65,71) = 70, etc.
I don't think you need a for loop for this, you can create structures since the beginning, with sample, sapply and which:
## correct values can repeat, so we set replace to TRUE
corrects <- sample(1:100, 100, replace = TRUE)
## replace is by default FALSE in sample(), if you don't want repeated guesses
## sapply() creates a matrix
guesses <- sapply(1:100, function(x) sample(1:100, 100))
## constructing game_results_i equal to yours, but could be simplified
game_results_i <- data.frame(
i = 1:100,
trial_index = sapply(
function(x) which(
## which() returns the index of the first element that makes the predicate true
guesses[, x] == corrects[x]
guess_i = corrects,
correct_i = corrects # guess_i and correct_i are obviously equal
Ok, let's see if now I match question and answer properly :)
If I got correctly your intentions, in both loops, you are setting increasingly finer lower and upper bounds. Each guess reduces the search space. However, this interpretation does not always match your description, please double check if it can be acceptable for your purposes.
I wrote two functions, guess_bisect for the deterministic loop_1 and guess_sample for loop_2:
guess_bisect <- function(correct, n = 100) {
lb <- 0
ub <- n + 1
trial_index <- 1
guess <- round((ub - lb) / 2) + lb
while (guess != correct) {
# cat(lb, ub, guess, "\n") # uncomment to print the guess iteration
if (guess < correct)
lb <- guess
ub <- guess
guess <- round((ub - lb) / 2) + lb
trial_index <- trial_index + 1
guess_sample <- function(correct, n = 100) {
lb <- 0
ub <- n + 1
trial_index <- 1
guess <- sample((lb + 1):(ub - 1), 1)
while (guess != correct) {
# cat(lb, ub, guess, "\n") # uncomment to print the guess iteration
if (guess < correct)
lb <- guess
ub <- guess
guess <- sample((lb + 1):(ub - 1), 1)
trial_index <- trial_index + 1
Obviously, guess_bisect always produces the same results with the same input, guess_sample changes randomly instead.
By plotting the results in a simple chart, it seems that the deterministic bisection is on the average much better, as the random sampling may become happen to pick improvements from the wrong sides. x-axis is the correct number, spanning 1 to 100, y-axis is the trial index, with guess_bisect you get the red curve, with many attempts of guess_sample you get the blue curves.
The following code runs a loops but the problem is the speed; it takes several hours to finish and I am looking for an alternative so that I donĀ“t have to wait so long.
Basically what the code does the follolling calculations:
1.-It calculates the mean of the values of the 60 days.
2.-It gets the standard deviation of the values of the 60 days.
3.-It gets the Max of the values of the 60 days.
4.-It gets the Min of the values of the 60 days.
5.-Then with the previous calculations the code "smooths" the peaks up and down.
6.-Then the code simply get the means from 60, 30, 15 and 7 Days.
So the purpose of these code is to remove the peaks of the data using the method already mentioned.
Here is the code:
DAT <- data.frame(ITEM = "x", CLIENT = as.numeric(1:100000), matrix(sample(1:1000, 60, replace=T), ncol=60, nrow=100000, dimnames=list(NULL,paste0('DAY_',1:60))))
nRow <- nrow(DAT)
for(iROW in 1:nRow){#iROW <- 1
Demand <- NULL
for(iCOL in 3:ncol(DAT)){#iCOL <- 1
Demand <- c(Demand,DAT[iROW,iCOL])
ww <- which(!is.na(Demand))
if(length(ww) > 0){
Average <- round(mean(Demand[ww]),digits=4)
DesvEst <- round(sd(Demand,na.rm=T),digits=4)
Max <- round(Average + (1 * DesvEst),digits=4)
Min <- round(max(Average - (1 * DesvEst), 0),digits=4)
Demand <- round(ifelse(is.na(Demand), Demand, ifelse(Demand > Max, Max, ifelse(Demand < Min, Min, Demand))))
Prom60 <- round(mean(Demand[ww]),digits=4)
Prom30 <- round(mean(Demand[intersect(ww,(length(Demand) - 29):length(Demand))]),digits=4)
Prom15 <- round(mean(Demand[intersect(ww,(length(Demand) - 14):length(Demand))]),digits=4)
Prom07 <- round(mean(Demand[intersect(ww,(length(Demand) - 6):length(Demand))]),digits=4)
Average <- DesvEst <- Max <- Min <- Prom60 <- Prom30 <- Prom15 <- Prom07 <- NA
DAT[iROW,3:ncol(DAT)] <- Demand
TMP <- rbind(TMP, cbind(DAT[iROW,], Average, DesvEst, Max, Min, Prom60, Prom30, Prom15, Prom07))
If one runs your code (with smaller number of rows) through a profiler, one sees that the main issue is the rbind in the end, followed by the c mentioned by #Riverarodrigoa:
We can focus on these two by creating numeric matrices of suitable size and working with those. Only in the end the final data.frame is created:
N <- 1000
DAT <- data.frame(ITEM = "x",
CLIENT = as.numeric(1:N),
matrix(sample(1:1000, 60, replace=T), ncol=60, nrow=N, dimnames=list(NULL,paste0('DAY_',1:60))))
nRow <- nrow(DAT)
TMP <- matrix(0, ncol = 8, nrow = N,
dimnames = list(NULL, c("Average", "DesvEst", "Max", "Min", "Prom60", "Prom30", "Prom15", "Prom07")))
DemandMat <- as.matrix(DAT[,3:ncol(DAT)])
for(iROW in 1:nRow){
Demand <- DemandMat[iROW, ]
ww <- which(!is.na(Demand))
if(length(ww) > 0){
Average <- round(mean(Demand[ww]),digits=4)
DesvEst <- round(sd(Demand,na.rm=T),digits=4)
Max <- round(Average + (1 * DesvEst),digits=4)
Min <- round(max(Average - (1 * DesvEst), 0),digits=4)
Demand <- round(ifelse(is.na(Demand), Demand, ifelse(Demand > Max, Max, ifelse(Demand < Min, Min, Demand))))
Prom60 <- round(mean(Demand[ww]),digits=4)
Prom30 <- round(mean(Demand[intersect(ww,(length(Demand) - 29):length(Demand))]),digits=4)
Prom15 <- round(mean(Demand[intersect(ww,(length(Demand) - 14):length(Demand))]),digits=4)
Prom07 <- round(mean(Demand[intersect(ww,(length(Demand) - 6):length(Demand))]),digits=4)
Average <- DesvEst <- Max <- Min <- Prom60 <- Prom30 <- Prom15 <- Prom07 <- NA
DemandMat[iROW, ] <- Demand
TMP[iROW, ] <- c(Average, DesvEst, Max, Min, Prom60, Prom30, Prom15, Prom07)
DAT <- cbind(DAT[,1:2], DemandMat, TMP)
For 1000 rows this takes about 0.2 s instead of over 4 s. For 10.000 rows I get 2 s instead of 120 s.
Obviously, this is not really pretty code. One could do this much nicer using tidyverse or data.table. I just find it worth noting that for loops are not necessarily slow in R. But dynamically growing data structures is.
I am working on Spike Trains and my code to get a spike train like this:
for 20 trials is written below. The image is representational for 5 trials.
fr = 100
dt = 1/1000 #dt in milisecond
duration = 2 #no of duration in s
nBins = 2000 #10msSpikeTrain
nTrials = 20 #NumberOfSimulations
MyPoissonSpikeTrain = function(p, fr= 100) {
p = runif(nBins)
q = ifelse(p < fr*dt, 1, 0)
SpikeMat <- t(replicate(nTrials, MyPoissonSpikeTrain()))
plot(x=-1,y=-1, xlab="time (s)", ylab="Trial",
main="Spike trains",
ylim=c(0.5, nTrials+1), xlim=c(0, duration))
for (i in 1: nTrials)
clip(x1 = 0, x2= duration, y1= (i-0.2), y2= (i+0.4))
abline(h=i, lwd= 1/4)
abline(v= dt*which( SpikeMat[i,]== 1))
Each trial has spikes occuring at random time points. Now what I am trying to work towards, is getting a random sample time point that works for all 20 trials and I want to get the vector consisting of length of the intervals this point falls into, for each trial. The code to get the time vector for the points where the spikes occur is,
A <- numeric()
for (i in 1: nTrials)
ISI <- function(i){
spike_times <- c(dt*which( SpikeMat[i, ]==1))
ISI1vec <- c(diff(spike_times))
A <- c(A, ISI1vec)
Then you call ISI(i) for whichever trial you wish to see the Interspike interval vector for. A visual representation of what I want is:
I want to get a vector that has the lengths of the interval where this points fall into, for each trial. I want to figure out it's distribution as well, but that's for later. Can anybody help me figure out how to code my way to this? Any help is appreciated, even if it's just about how to start/where to look.
Your data
SpikeMat <- t(replicate(nTrials, MyPoissonSpikeTrain()))
I suggest transforming your sparse matrix data into a list of indices where spikes occur
L <- lapply(seq_len(nrow(SpikeMat)), function(i) setNames(which(SpikeMat[i, ] == 1), seq_along(which(SpikeMat[i, ] == 1))))
Grab random timepoint
RT <- round(runif(1) * ncol(SpikeMat))
# 531
distances contains the distances to the 2 nearest spikes - each element of the list is a named vector where the values are the distances (to RT) and their names are their positions in the vector. nearest_columns shows the original timepoint (column number) of each spike in SpikeMat.
bookend_values <- function(vec) {
lower_val <- head(sort(vec[sign(vec) == 1]), 1)
upper_val <- head(sort(abs(vec[sign(vec) == -1])), 1)
return(c(lower_val, upper_val))
distances <- lapply(L, function(i) bookend_values(RT-i))
nearest_columns <- lapply(seq_along(distances), function(i) L[[i]][names(distances[[i]])])
Note that the inter-spike interval of the two nearest spikes that bookend RT can be obtained with
sapply(distances, sum)
how to generate a vector which satisfy some conditions?
Problem: generate a vector a such that length(a)=400000 which is made up of 8 elements:0, 5, 10, 50, 500, 5000, 50000, 300000. Each element appears a set number of times, namely 290205, 100000, 8000, 1600, 160, 32, 2, 1, respectively. Further, a is blocked into 4,000 "groups" of 100 consecutive elements; call them a_k, k=1,...,4000. These groups must satisfy the following:
The sum of every group exceeds 150, i.e. sum_i a_k_i>150 for all k.
The elements 5, 10 and 50 appear between 25 and 29 times in each group, i.e. for all k, the set {i|a_i_k in (5,10,50)} has magnitude between 25 and 29.
0 never appears more than 8 times in a row in any group.
I have tried this many times, but it does not seem to work:
My current code is as follows:
T <- 4*10^(5) # data size
x <- c(0, 5, 10, 50, 500, 5000, 50000, 300000) #seed vector
t <- c(290205, 100000, 8000, 1600, 160, 32, 2, 1) #frequency
A <- matrix(0, 4000, 100) #4000 groups
k <- rep(0, times = 8) #record the number of seeds
for(m in 1:4000) {
p <- (t - k)/(T - 100*(m - 1)) #seed probability
A[, m] <- sample(x, 100, replace = TRUE, prob = p) #group m
sm <- 0
i <- 0
for(j in 1:92) {
if(sum(A[m,j:j + 8])==0){
if(A[m,j] > 0 & A[m,j] < 500) {i <- i+1}
sm <- sm+A[100*m+j]
else j <- 0
if (sm >= 150 & i > 24 & i < 30 & j != 0) {
m <- m + 1
for (n in seq_len(x)) {
k[n] <- sum(A[, m+1] == x[n]) + k[n]
How about just doing it by construction? For example:
This satisfies all your conditions:
Element counts:
[1] 290205 100000 8000 1600 160 32 2 1
Group sums:
> table(colSums(amat)>=150)
5,10,50 frequency:
> table(sapply(1:4000,function(x)abs(sum(amat[,x] %in% c(5,10,50))-27)<=2))
Runs of 0:
> table(sapply(1:4000,function(x)max(rle(amat[,x])$lengths[rle(amat[,x])$values==0])<=8))
#If this is slow, we can just use max(rle(amax[,x]))<=8
# because there aren't many valid groups with strings of 9+
# non-0 elements
if in fact we're never allowed to have strings of 9 0s, we'll need to make a slight adjustment to groups 2:2206, because, e.g. a[100:108]==0
I can start it off and maybe someone can help get to the next step. My approach is to start with the constraints and let sample work out the numbers.
choose <- c(0,5,10,50,500,5000,50000,300000)
freqs <- c(290205,100000,8000,1600,160,32,2,1)
probs <- freqs/sum(freqs)
check.sum <- function(vec) sum(vec) >= 150
check.interval <- function(vec) abs(sum(vec %in% c(5,10,50))-27)<=2
check.runs <- function(vec, runmax=8) max(rle(vec)$lengths[rle(vec)$values==0]) <= runmax
check.all <- function(vector) {
logicals <- c(check.sum(vector),
nums <- NULL
res <- list()
for(i in 1:4000) {
nums <- numeric(100)
while(!check.all(nums)) {nums <- sample(choose, 100, replace=T,prob=probs)}
res[i] <- list(nums)
List of 4000
$ : num [1:100] 1e+01
So this gets you a list of 4,000 groups of 100 numbers that fit the constraints. It only took about two seconds of system time.
Next step is for someone to get a way to build something similar except eliminate 300000 once it is used, and 50000 once it is used twice and so on.
Inspired by #plafort's approach, I've come up with the following that seems to work very quickly and should be capable of generating all vectors satisfying your conditions:
grp.cond2<-function(x)abs(sum(x %in% c(5,10,50))-27)<=2
I've also written the code in a way that should be easy to generalize to other element sets/counts, group numbers, and group-wise conditions.
Unfortunately it seems these conditions are pretty stringent, and it may take a long time to generate an acceptable a. I let the while loop run ~1300 times with no success...
Thanks for everyone! I have figured out my problem.
rm(list = ls())
media <- matrix(rep(rep(c(0,5,NA),c(72,25,3)),4000),nrow=100)
media[98:100,1:2400] <-c(10,10,10)
media[98:99,2401:3200] <-c(50,10)
media[98:99,3201:4000] <-c(50,0)
media[100,2401:4000] <-rep(c(0,500,5000,50000,300000),c(1405,160,32,2,1))
obj1 <- matrix(0,100L,4000)
obj2 <-obj1
grp.cond<-function(x) max(rle(x)$lengths[rle(x)$values==0])<=8
for(i in 1:4000){
freq<-c(sapply(elts, function(x) length(which(media[,i]==x))))
for(i in 1:4000){obj2[,i]<-obj1[,a1[i]]}
a <- c(obj2)
I am new to R but trying desperately to learn the ropes. In fact I feel a little stupid asking this question as I have gone through a number of similar problems but have not been able to get the desired results. My code is as shown below :
## Initializing Parameters
fstart <- 960 ## Start frequency in MHz
fstop <- 1240 ## Stop Frequency In MHz
bw <- 5.44 ## IF Bandwidth in MHz
offset <- 100 ## Max. Variation in TOD in milliseconds
f_dwell <- 1 ## Time spent on each search frequency in millisecond
iterations <- 100 ## No. of iterations to run
## No. of possible frequencies
f <- seq((fstart + bw/2), (fstop - bw/2), by=bw)
## Initializing the frequency table
freq_table <- matrix (NA, nrow=(2*offset +1), ncol=offset)
## Fill frequency table row wise with random values of possible frequencies
for (i in 1:(2*offset + 1)){
row_value <- c(sample(f), sample(f, offset-length(f)))
freq_table[i, ] <- row_value
## Assign a row from freq_table to unknown node
unknown_node <- freq_table[sample(1:(2*offset + 1), 1), ]
t = numeric(iterations)
## Calculate number of repetitions of frequencies
for(k in 1:iterations){
for(j in 1:offset){
y <- (sort(table(freq_table[, j]), decreasing=TRUE))
x <- as.vector(y) ## Number of repetitions of each frequency
y <- names(y)
## Search Frequencies
sf1 <- as.numeric(y[1])
sf2 <- as.numeric(y[2])
if (unknown_node[j] == sf1){
t[k] <- ((j-1)*f_dwell)*2 + f_dwell
else {
if (unknown_node[j] == sf2){
t[k] <- ((j-1)*f_dwell)*2 + 2*f_dwell
## Delete rows from freq_table that have sf1 & sf2
freq_table <- subset(freq_table, freq_table[, 1]!=sf1 & freq_table[, 1]!=sf2 )
If I run this without the k for loop, I get different values of variable t every time. However, I wanted to run the inner for loop iteratively and get a vector of t values each time the inner for loop runs. I do get the length of t as 100, but the values are repeating. The first few values (2 0r 3 or sometimes 4) are different, but the rest keep repeating. I can't figure out why.