I am trying to run a loop that contains a while and an if statement. The code works outside of the loop but not inside! This is a very simplified version which basically is trying to collect sets xx[j] which contain 10 numbers each.
When I run it, it never actually allocates the 'x' to the set xx[j] but I'm not sure why!
n <- 10
xx <- list()
for (j in 1:5) {
xx[j] <= NULL
while (length(xx[j]) < n) {
x <- runif(1)
if (0.5 <= x) {
xx[j] <- c(xx[j], x)
}
}
}
I've fixed and polished the code.
The changes are:
Elements of a list are accessed with double brackets xx[[j]]
The list is created of the target length 5
Removed setting the elements of the list to NULL as they are NULL initially
The fixed code:
n = 10
xx = vector('list',5)
for (j in seq_along(xx)) {
while(length(xx[[j]]) < n){
x = runif(1)
if (0.5 <= x) {
xx[[j]] = c(xx[[j]], x)
}
}
}
Related
I'm trying to solve the problem #14 of Project Euler.
So the main objective is finding length of Collatz sequence.
Firstly I solved problem with regular loop:
compute <- function(n) {
result <- 0
max_chain <- 0
hashmap <- 1
for (i in 1:n) {
chain <- 1
number <- i
while (number > 1) {
if (!is.na(hashmap[number])) {
chain <- chain + hashmap[number]
break
}
if (number %% 2 == 0) {
chain <- chain + 1
number <- number / 2
} else {
chain <- chain + 2
number <- (3 * number + 1) / 2
}
}
hashmap[i] <- chain
if (chain > max_chain) {
max_chain <- chain
result <- i
}
}
return(result)
}
Only 2 seconds for n = 1000000.
I decided to replace while loop to recursion
len_collatz_chain <- function(n, hashmap) {
get_len <- function(n) {
if (is.na(hashmap[n])) {
hashmap[n] <<- ifelse(n %% 2 == 0, 1 + get_len(n / 2), 2 + get_len((3 * n + 1) / 2))
}
return(hashmap[n])
}
get_len(n)
return(hashmap)
}
compute <- function(n) {
result <- 0
max_chain <- 0
hashmap <- 1
for (i in 1:n) {
hashmap <- len_collatz_chain(i, hashmap)
print(length(hashmap))
if (hashmap[i] > max_chain) {
max_chain <- hashmap[i]
result <- i
}
}
return(result)
}
This solution works but works so slow. Almost 1 min for n = 10000.
I suppose that one of the reasons is R creates hashmap object each time when call function len_collatz_chain.
I know about Rcpp packages and yes, the first solution works fine but I can't understand where I'm wrong.
Any tips?
For example, my Python recursive solution works in 1 second with n = 1000000
def len_collatz_chain(n: int, hashmap: dict) -> int:
if n not in hashmap:
hashmap[n] = 1 + len_collatz_chain(n // 2, hashmap) if n % 2 == 0 else 2 + len_collatz_chain((3 * n + 1) // 2, hashmap)
return hashmap[n]
def compute(n: int) -> int:
result, max_chain, hashmap = 0, 0, {1: 1}
for i in range(2, n):
chain = len_collatz_chain(i, hashmap)
if chain > max_chain:
result, max_chain = i, chain
return result
The main difference between your R and Python code is that in R you use a vector for the hashmap, while in Python you use a dictionary and that hashmap is transferred many times as function argument.
In Python, if you have a Dictionary as function argument, only a reference to the actual data is transfered to the called function. This is fast. The called function works on the same data as the caller.
In R, a vector is copied when used as function argument. This is potentially slow, but safer in the sense that the called function cannot alter the data of the caller.
This the main reason that Python is so much faster in your code.
You can however alter the R code slightly, such that the hashmap is not transfered as function argument anymore:
len_collatz_chain <- local({
hashmap <- 1L
get_len <- function(n) {
if (is.na(hashmap[n])) {
hashmap[n] <<- ifelse(n %% 2 == 0, 1 + get_len(n / 2), 2 + get_len((3 * n + 1) / 2))
}
hashmap[n]
}
get_len
})
compute <- function(n) {
result <- rep(NA_integer_, n)
for (i in seq_len(n)) {
result[i] <- len_collatz_chain(i)
}
result
}
compute(n=10000)
This makes the R code much faster. (Python will probably still be faster though).
Note that I have also removed the return statements in the R code, as they are not needed and add one level to the call stack.
I want to create an script that calculates probabilities for a rol game.
I´m new to programming and I´m stuck with the return values and nested functions. What I want is to use the values returned by the first function in the next one.
I have two functions dice(k, n) and fight(a, b). (for the example, the functions are partly written):
dice <- function (k, n) {
if (k > 3 && n > 2){
a <- 3
b <- 2
attack <- sample(1:6, a)
deff <- sample(1:6, b)
}
return(c(attack, deff))
}
So I want to use the vector attack, and deff in the next function:
fight <- function(a, b){
if (a == 3 && b == 2){
if(sort(attack,T)[1] > sort(deff,T)[1]){
n <- n - 1}
if (sort(attack,T)[1] <= sort(deff,T)[1]) {
k <- k - 1}
if (sort(attack,T)[2] > sort(deff,T)[2]) {
n <- n - 1}
if (sort(attack,T)[2]<= sort(deff,T)[2]){
k <- k - 1}
}
return(c(k, n)
}
But this gives me the next error:
Error in sort(attack, T) : object 'attack' not found
Any ideas? Thanks!
I get a problem when I run this program in R.
anybody help me to solving this problem..?
par_1<-cbind(c(5.038159),c(3.899621))
par_2<-cbind(c(2.435457),c(13.89517))
tau<-365
cdf2 <- function(x, help) {
pgamma(x, shape=par_1[1], scale=par_1[2]) *
pgamma(x, shape=par_2[1], scale=par_2[2])-help
}
nextEventTime <- function(censoring) {
randomNumber <- runif(n=1, min=0, max=1)
pnew <- randomNumber * (1 - cdf2(censoring, 0)) + cdf2(censoring, 0)
uniroot(f=cdf2, interval=c(0, 1000*tau), help=pnew)$root
}
hazardRate1 <- function(t) {
dgamma(t, shape=par_1[1], scale=par_1[2]) /
(1 - pgamma(t, shape=par_1[1], scale=par_1[2]))
}
hazardRate2 <- function(t) {
dgamma(t, shape=par_2[1], scale=par_2[2]) /
(1 - pgamma(t,shape=par_2[1], scale=par_2[2]))
}
nextEventType <- function(t) {
p <- hazardRate1(t)/(hazardRate1(t)+hazardRate2(t))
randomNumber <- runif(n=1, min=0, max=1)
if (randomNumber <= p) {1} else {2}
}
baris<-c(1:20000)
nexteventtime<-rep(0,time=20000)
nexteventype<-rep(0,time=20000)
dfnexteventime<-data.frame(baris,nexteventtime,nexteventype)
for(i in 1:nrow(dfnexteventime)){
dfnexteventime$nexteventtime[i]<-nextEventTime(dfnexteventime$nexteventtime[i])
dfnexteventime$nexteventype[i]<-nextEventType(dfnexteventime$nexteventtime[i])
}
View(dfnexteventime)
When I run this program, this program will error & produce output like this
Error in if (randomNumber <= p) { : missing value where TRUE/FALSE needed
I think this problem because t value in nextEventType(t) function can't zero (t!=0).
But nextEventTime(dfnexteventime$nexteventtime[i]) never produce zero value, when I run this part for 10 times,
baris<-c(1:20000)
nexteventtime<-rep(0,time=20000)
nexteventype<-rep(0,time=20000)
dfnexteventime<-data.frame(baris,nexteventtime,nexteventype)
for(i in 1:nrow(dfnexteventime)){
dfnexteventime$nexteventtime[i]<-nextEventTime(dfnexteventime$nexteventtime[i])
}
without nextEventType function. This part never produce 0 value.
So, I confuse, what is a problem?.
I want result nextEventType(t) produce not zero value.
because if using zero value will be Error in if(ramdonNumber <= p) { :...
Your problem isn't calling nextEventType(t) on zero, since this will never happen. However, the same error occurs whenever nextEventType(t) is called on a value of t greater than 195. At this point, the term pgamma(t, shape=par_1[1], scale=par_1[2]) is so close to one that R evaluates 1 - pgamma(t, shape=par_1[1], scale=par_1[2]) to zero, so hazardRate1(t) returns Inf. Since nextEventType(t) is trying to assign p to Inf/Inf, p is never defined.
> p <- hazardRate1(196)/(hazardRate1(196) + hazardRate2(196))
> p
[1] NaN
This will only happen in very extreme cases, when you happen to draw > 195 in nextEventTime(t), which only occurs around once in 30,000 random draws. That's why you don't see it when you run it 10 times, but often you do when you run it 20,000 times.
random_draws <- numeric()
for(i in 1:1000000) random_draws[i] <- nextEventTime(0)
length(which(random_draws > 195))
# > [1] 28
I have a rly complex function so i dont post it here but within this function i want to run a loop . The result of each run should be saved in a vector entry. Also i want to have and if else statement within this loop.
When i run the loop i get the vector vr but i have only Null entries exept the last entry. How can i adjust the loop so that every result of the loop is saved in the vector and not only the last one?
for (i in length(y)) {
if( y[i] == 0) {
vr[i] <- fittedValuesFullModell[i] - y[i]
} else {
vr[i] <- sign(y[i] - fittedValuesFullModell[i]) *
sqrt(2 *(y[i] * log(y[i] / fittedValuesFullModell[i]) -
(y[i] - fittedValuesFullModell[i])))
}
Consider vapply or sapply designed to iterate through a series and save outputs directly into data structures (i.e., vector, matrix, array) instead of initializing empty ones and filling in loop:
vr <- vapply(seq_along(y), function(i) {
if( y[i] == 0) {
fittedValuesFullModell[i] - y[i]
} else {
sign(y[i] - fittedValuesFullModell[i]) *
sqrt(2 *(y[i] * log(y[i] / fittedValuesFullModell[i]) -
(y[i] - fittedValuesFullModell[i])))
}
}, numeric(length(y)))
In fact, looking closer, you may be able to run mapply since you are iterating elementwise through y and fittedValuesFullModell objects with ifelse() if single values are being passed into method.
vr <- mapply(function(i, j)
ifelse(i == 0, f - i, sign(i - f) * sqrt(2 *(i * log(i / f) - (i - f)))),
y, fittedValuesFullModell)
I'm trying to speed up code that takes time series data and limits it to a maximum value and then stretches it forward until sum of original data and the "stretched" data are the same.
I have a more complicated version of this that is taking 6 hours to run on 100k rows. I don't think this is vectorizable because it uses values calculated on prior rows - is that correct?
x <- c(0,2101,3389,3200,1640,0,0,0,0,0,0,0)
dat <- data.frame(x=x,y=rep(0,length(x)))
remainder <- 0
upperlimit <- 2000
for(i in 1:length(dat$x)){
if(dat$x[i] >= upperlimit){
dat$y[i] <- upperlimit
} else {
dat$y[i] <- min(remainder,upperlimit)
}
remainder <- remainder + dat$x[i] - dat$y[i]
}
dat
I understand you can use ifelse but I don't think cumsum can be used to carry forward the remainder - apply doesn't help either as far as I know. Do I need to resort to Rcpp? Thank you greatly.
I went ahead and implemented this in Rcpp and made some adjustments to the R function:
require(Rcpp);require(microbenchmark);require(ggplot2);
limitstretchR <- function(upperlimit,original) {
remainder <- 0
out <- vector(length=length(original))
for(i in 1:length(original)){
if(original[i] >= upperlimit){
out[i] <- upperlimit
} else {
out[i] <- min(remainder,upperlimit)
}
remainder <- remainder + original[i] - out[i]
}
out
}
The Rcpp function:
cppFunction('
NumericVector limitstretchC(double upperlimit, NumericVector original) {
int n = original.size();
double remainder = 0.0;
NumericVector out(n);
for(int i = 0; i < n; ++i) {
if (original[i] >= upperlimit) {
out[i] = upperlimit;
} else {
out[i] = std::min<double>(remainder,upperlimit);
}
remainder = remainder + original[i] - out[i];
}
return out;
}
')
Testing them:
x <- c(0,2101,3389,3200,1640,0,0,0,0,0,0,0)
original <- rep(x,20000)
upperlimit <- 2000
system.time(limitstretchR(upperlimit,original))
system.time(limitstretchC(upperlimit,original))
That yielded 80.655 and 0.001 seconds respectively. Native R is quite bad for this. However, I ran a microbenchmark (using a smaller vector) and got some confusing results.
res <- microbenchmark(list=
list(limitstretchR=limitstretchR(upperlimit,rep(x,10000)),
limitstretchC=limitstretchC(upperlimit,rep(x,10000))),
times=110,
control=list(order="random",warmup=10))
print(qplot(y=time, data=res, colour=expr) + scale_y_log10())
boxplot(res)
print(res)
If you were to run that you would see nearly identical results for both functions. This is my first time using microbenchmark, any tips?