Replace Loops in R function - r

I'm new to R, and I'm having trouble figuring out how to replace the FOR loop in the function below. The function estimates a population mean. Any help at all would be much appreciated. Thank you!
myFunc<- function(){
myFRAME <- read.csv(file="2008short.csv",head=TRUE,sep=",")
meanTotal <- 0
for(i in 1:100)
{
mySample <- sample(myFRAME$TaxiIn, 100, replace = TRUE)
tempMean <- mean(mySample)
meanTotal <- meanTotal + tempMean
}
cat("Estimated Mean: ", meanTotal/100, "\n") #print result
}

As Rob suggests your loop is unnecessary, but in the spirit of the question the 'replicate()'
function can directly replace your for loop. Like so:
myFunc <- function(){
myFRAME <- read.csv(file="2008short.csv",head=TRUE,sep=",")
meanTotal <- <- mean(replicate(100,mean(sample(myFRAME$TaxiIn,100,T))))
cat("Estimated Mean: ", meanTotal, "\n")
}

Your code takes the mean of 100 sample means, each based on a sample of 100 observations. This is equivalent to taking the mean of 10,000 observations. So the following will do the same thing:
myFunc <- function(){
myFRAME <- read.csv(file="2008short.csv",head=TRUE,sep=",")
meanTotal <- sample(myFRAME#TaxiIn,10000,replace=TRUE)
cat("Estimated Mean: ", meanTotal, "\n")
}

Related

Using function "cat" with "replicate" in R

Is there a way how to combine function "cat" with function "replicate" in R?
I want to see number of "loops" R has already made at a particular moment. However, instead of using "for" loop, I prefer to use "replicate". See the simple example below:
Data <- rnorm(20,20,3)
# with for loop
N <- 1000
outcome <- NULL
for(i in 1:N){
Data.boot <- sample(Data, replace=TRUE)
outcome[i] <- mean(Data.boot)
cat("\r", i, "of", N)
}
#the same but with replicate
f <- function() {
Data.boot <- sample(Data, replace=TRUE)
outcome <- mean(Data.boot)
return(outcome)
}
replicate(N, f())
Thus, any ideas how to implement function "cat" with "replicate" (as well as other approaches to see a number of how many times the function of interest has been executed with "replicate") would be very appreciated. Thank you!
As an alternative, you could use sapply instead of replicate:
Data <- rnorm(20,20,3)
N <- 1000
f <- function(i) {
Data.boot <- sample(Data, replace=TRUE)
cat("\r", i, "of", N)
mean(Data.boot)
}
outcome <- sapply(1:N, f)
or alternatively, using plyr, you could use raply with the progress option (if your main purpose is to see how far through you are):
outcome <- plyr::raply(N, mean(sample(Data, replace = TRUE)), .progress = "text")
You could use scoping in the following way:
i = 0
f <- function() {
Data.boot <- sample(Data, replace=TRUE)
outcome <- mean(Data.boot)
i <<- i + 1
print(i)
return(outcome)
}

Speed up sample function in R

In a quite big data frame, I have to pick up some random rows to execute a function. In my example, the first function I use is the variance and then a function closed to the real one I use in my script, called after f. I do not detail the purpose of f but it deals with truncated Gaussian distribution and maximum-likelihood estimation.
My problem is that my code is way too slow with the second function and I suppose a bit of optimization of the for loop or the sample function could help me.
Here is the code :
df <- as.data.frame(matrix(0,2e+6,2))
df$V1 <- runif(nrow(df),0,1)
df$V2 <- sample(c(1:10),nrow(df), replace=TRUE)
nb.perm <- 100 # number of permutations
res <- c()
for(i in 1:nb.perm) res <- rbind(res,tapply(df[sample(1:nrow(df)),"V1"],df$V2,var))
library(truncnorm)
f <- function(d) # d is a vector
{
f2 <- function(x) -sum(log(dtruncnorm(d, a=0, b=1, mean = x[1], sd = x[2])))
res <- optim(par=c(mean(d),sd(d)),fn=f2)
if(res$convergence!=0) warning("Optimization has not converged")
return(list(res1=res$par[1],res2=res$par[2]^2))
}
for(i in 1:nb.perm) res2 <- rbind(res,tapply(df[sample(1:nrow(df)),"V1"],df$V2,function(x) f(x)$res2))
I hope I am clear enough.

Application of mclapply() to a function writing to a global variable

I'm trying to use parallel::mclapply to speed up the calculation of the following code:
library(raster)
library(HistogramTools)#for AddHistogram
#Create a first h here for the first band... omitted for brevity
readNhist <- function(n,mconst) {
l <- raster(filename[i], varname=var[i], band=n, na.rm=T)
gain(l) <- mconst
h <<- AddHistograms(h, hist(l, plot=F, breaks=histbreaks,right=FALSE))
}
lapply( 1:10000, readNhist, mconst=1, mc.cores=7 )
#Then do stuff with the h histogram...
When performing the code above, all is fine. If using mclapply (below), the result is miles away from what I want to obtain: the histograms are all wrong.
library(raster)
library(HistogramTools)#for AddHistogram
library(parallel)
#Create a first h here for the first band... omitted for brevity
readNhist <- function(n,mconst) {
l <- raster(filename[i], varname=var[i], band=n, na.rm=T)
gain(l) <- mconst
h <<- AddHistograms(h, hist(l, plot=F, breaks=histbreaks,right=FALSE))
}
mclapply( 2:10000, readNhist, mconst=1 )
#Then do stuff with the h histogram...
I feel like there's something vital I'm missing with the application of parallel computation to this function.
The problem is the <<- which is bad practice in general as far as I can gather.
The function can be rearranged thusly:
readNhist <- function(n,mconst) {
l <- raster(filename, varname=var, band=n, na.rm=T)
gain(l) <- mconst
hist <- hist(l, plot=F, breaks=histbreaks,right=FALSE)
return(hist)
}
And called like this:
hists <- mclapply( 2:nbands, readNhist, mconst=gain, mc.cores=ncores )
ch <- AddHistograms(x=hists)
h <- AddHistograms(h, ch)
rm(ch, hists)
This is pretty fast even with a huge number of layers (and thus histograms).

Iterate a function R code

I have a function myF(g,m,alpha,gam,theta,beta). Which returns three estimates of parameters. I want to iterate this function for (i in 1:10). How can i do this it in R?
myF <- function(g,m,alpha,gam,theta,beta){
dat <- sim.data(g,m,alpha,gam,theta,beta)
time <- dat$times
delta <- dat$cens
i <- dat$group
X1<-dat$cov #cov~rbinom
n <- length(levels(as.factor(i)))
di <- aggregate(delta,by=list(i),FUN=sum)[,2]
D <- sum(di)
loglik <- function(par){
.........................................
return(-lik)
}
initial=c(0.5,0.5,-0.5,0.5)
maxF <- nlm(loglik, initial)
return(c(theta=exp(maxF$estimate[2]),beta1=maxF$estimate[3],alpha=exp(maxF$estimate[2])))
}
This can easily be done using replicate:
replicate(10, myF(g,m,alpha,gam,theta,beta))
This will create a 3*10 matrix of the parameter estimates, where each column is the result of a separate iteration.

R: Artificial set generation

I would like to generate the set with growing number of some representative.
In final I need a matrix or a data.frame, consisting of 100 rows containing i number of representative (in example it's 1). But there is a following error. What is the trick? What I am missing?
Error: no function to return from, jumping to top level
for(i in 1:100) {
x <- c(rep(1,i),rep(100000,(2500-i)))
return(x)
}
Many thanks!
You can only use return within a function. One solution is to create a matrix to store the results in, something like this:
R> m = matrix(0, ncol=100, nrow=2500)
R>
R> for(i in 1:100) {
+ m[,i] = c(rep(1, i), rep(100000, (2500-i)))
+ }
should do the trick. Or using the sapply function:
m1 = sapply(1:100, function(i) c(rep(1, i), rep(100000,(2500-i))))
For info, your rep function can also be simplified to:
rep(c(1, 1000000), c(i, 2500-i))

Resources