I am executing this operation and when the process gives me NaN,
Choques<-c(1:10)
print (Choques)
pr<-0
n<-3818
p<-0.040633627
for(i in Choques) {
pr[i]<-(factorial(n)/(factorial(Choques[i])*factorial(n-Choques[i])))*p^Choques[i]*(1-p)^(n-Choques[i])
print (pr[i])
}
However changing the value of the variable n for a smaller number let's say 20, if it shows numbers, I would like to know if there is any method to change the NaN by numbers, I suppose the numbers are too big.
Replace
factorial(n)/(factorial(Choques[i])*factorial(n-Choques[i]))
with
choose(n, Choques[i])
You are computing binomial probabilities in a loop and accumulating them in a vector. You can drop the factorials and even the loop by just using the vectorized function dbinom. Your loop can be replaced by the single line:
pr <- dbinom(Choques,n,p)
Related
I've tried a couple ways of doing this problem but am having trouble with how to write it. I think I did the first three steps correctly, but now I have to fill the vector z with numbers from y that are divisible by four, not divisible by three, and have an odd number of digits. I know that I'm using the print function in the wrong way, I'm just at a loss on what else to use ...
This is different from that other question because I'm not using a while loop.
#Step 1: Generate 1,000,000 random, uniformly distributed numbers between 0
#and 1,000,000,000, and name as a vector x. With a seed of 1.
set.seed(1)
x=runif(1000000, min=0, max=1000000000)
#Step 2: Generate a rounded version of x with the name y
y=round(x,digits=0)
#Step 3: Empty vector named z
z=vector("numeric",length=0)
#Step 4: Create for loop that populates z vector with the numbers from y that are divisible by
#4, not divisible by 3, with an odd number of digits.
for(i in y) {
if(i%%4==0 && i%%3!=0 && nchar(i,type="chars",allowNA=FALSE,keepNA=NA)%%2!=0){
print(z,i)
}
}
NOTE: As per #BenBolker's comment, a loop is an inefficient way to solve your problem here. Generally, in R, try to avoid loops where possible to maximise the efficiency of your code. #SymbolixAU has provided an example of doing so here in the comments. Having said that, in aid of helping you learn the ins-and-outs of loops and vectors, here's a solution which only requires a change to one line of your code:
You've got the vector created before the loop, that's a good start. Now, inside your loop, you need to populate that vector. To do so, you've currently got print(z,i), which won't really do too much. What you need to to change the vector itself:
z <- c( z, i )
Should work for you (just replace that print line in your loop).
What's happening here is that we're taking the existing z vector, binding i to the end of it, and making that new vector z again. So every time a value is added, the vector gets a little longer, such that you'll end up with a complete vector.
where you have print put this instead:
z <- append(z, i)
Suppose I am running the following loop (trying to acheive convergence for y)
j=1
maxiter=10
for (j<maxiter) {
y[j]=y[j-1]+1 (#just to take an e.g of newton raphson method)
j=j+1
}
I want to run this loop 500 times and put the resulting values of y in a matrix. More specifically if I am running an algorithm and I want to run the algorithm 500 times, How do I do it?
First off, you appear to be mixing up 'for loops' and 'while loops'.
A 'for loop' is used by giving the set of values to use, e.g. for(i in 1:10) would run 10 times changing the value of i progressively from 1 to 10 each loop.
A 'while loop' will run infinitely as long as its condition is met,e.g.
i=1
while(i<10) i=i+1
That would run 9 times resulting in i==10.
There may be more efficient ways to accomplish your goal as indicated in the comments above but to keep in line with your original code you need something like this:
j=1
y=1 # Whatever starting value you think appropriate for your convergence.
while(j<500) y[j+1]=y[j]+1
This will create a vector of values (though non-converging in this case).
For example:
Let M be some matrix mXn matrix where n is large enough to make manual entry impossible.
tmp_list[1] <- M[,1:10]
tmp_list[2] <- M[,11:20]
.
.
.
tmp_list[last] <- M[end - 9,end]
The problem I'm working on is sort of monte carlo, repeating an experiment involving a random mXn matrix 100K times. I'm still pretty new to R, I've done it using a for loop, but it obviously took a very long time. So I'm hoping to assign each "experiment" to an element of a list and use lapply.
let's take the easy case, and you can expand it from there
say n=100, develop your start indeces
n<-100
byParam<-10
starts<-seq(1, n-(byParam-1), by=byParam)
then lapply
tmp_list<-lapply(starts, function(startIndex) M[, startIndex:(startIndex+(byParam-1)])
just one way to do it, becomes a bit more complicated if n is not a nice multiple of 10 (or whatever you set the "byParam" equal to). If that is the case then you can develop your start and end indeces, and then use mapply instead
#given start and end indeces
tmp_list<-mapply(function(startInd, endInd){
M[, startInd:endInd},
startInd=starts, endInd=ends)
Now lapply and mapply are still iterative, so I wouldn't expect massive improvement on time efficiency
EDIT
After discussion in the comments, here is a solution for the entire set up, not just the above question
tmp_list<-lapply(1:1000, function(i){
vect<-sample(c(0,1), 10*1000, replace=TRUE)
dim(vect)<-c(10, 1000)
vect
})
Let's break this down, it makes everything very simple.
We first create a random sample of 1's and 0's, of the length 10*1000 (the number of elements in each sub-matrix). We can then neatly convert that vector to a matrix by assigning it's dim attribute to be c(10, 1000), which changes its form to have 10 rows and 1000 columns. Then we return that into a list at the index i. We lapply over 1:1000, or iterate 1000 times.
Using R, I need to evaluate an expression of the form [using latex notation]
\frac{1}{n^2} \sum_{i=1}^n \sum_{j=1}^n f(x_i-x_j),
where x_i,x_j are real (scalar) numbers and f is a nonlinear function with scalar input and output.
My current best [now using R commands] is
mat <- outer(x,x,function(y,z) f(y-z))
res <- mean(mat)
where x is a vector of length n which holds all the x_i's.
For n = 10000, this operation takes about 26 seconds on my PC, but (as expected) computation time increases heavily in n. I'd like to speed this up, mainly because I want to pass the above result to an optimizer later on. Any suggestions? Thanks!
I am confused by the behavior of is.na() in a for loop in R.
I am trying to make a function that will create a sequence of numbers, do something to a matrix, summarize the resulting matrix based on the sequence of numbers, then modify the sequence of numbers based on the summary and repeat. I made a simple version of my function because I think it still gets at my problem.
library(plyr)
test <- function(desired.iterations, max.iterations)
{
rich.seq <- 4:34 ##make a sequence of numbers
details.table <- matrix(nrow=length(rich.seq), ncol=1, dimnames=list(rich.seq))
##generate a table where the row names are those numbers
print(details.table) ##that's what it looks like
temp.results <- matrix(nrow=10, ncol=2, dimnames=list(1:10))
##generate some sample data to summarize and fill into details.table
temp.results[,1] <- rep(5:6, 5)
temp.results[,2] <- rnorm(10)
print(temp.results) ##that's what it looks like
details.table[,1][row.names(details.table) %in% count(temp.results[,1])$x] <-
count(temp.results[,1])$freq
##summarize, subset to the appropriate rows in details.table, and fill in the summary
print(details.table)
for (i in 1:max.iterations)
{
rich.seq <- rich.seq[details.table < desired.iterations | is.na(details.table)]
## the idea would be to keep cutting this sequence of numbers down with
## successive iterations until the desired number of iterations per row in
## details.table was reached. in other words, in the real code i'd do
## something to details.table in the next line
print(rich.seq)
}
}
##call the function
test(desired.iterations=4, max.iterations=2)
On the first run through the for loop the rich.seq looks like I'd expect it to, where 5 & 6 are no longer in the sequence because both ended up with more than 4 iterations. However, on the second run, it spits out something unexpected.
UPDATE
Thanks for your help and also my apologies. After re-reading my original post it is not only less than clear, but I hadn't realized count was part of the plyr package, which I call in my full function but wasn't calling here. I'll try and explain better.
What I have working at the moment is a function that takes a matrix, randomizes it (in any of a number of different ways), then calculates some statistics on it. These stats are temporarily stored in a table--temp.results--where temp.results[,1] is the sum of the non zero elements in each column, and temp.results[,2] is a different summary statistic for that column. I save these results to a csv file (and append them to the same file at subsequent iterations), because looping through it and rbinding hogs a lot of memory.
The problem is that certain column sums (temp.results[,1]) are sampled very infrequently. In order to sample those sufficiently requires many many iterations, and the resulting .csv files would stretch into the hundreds of gigabytes.
What I want to do is create and then update a table (details.table) at each iteration that keeps track of how many times each column sum actually got sampled. When a given element in the table reaches the desired.iterations, I want it to be excluded from the vector rich.seq, so that only columns that haven't received the desired.iterations are actually saved to the csv file. The max.iterations argument will be used in a break() statement in case things are taking too long.
So, what I was expecting in the example case is the exact same line for rich.seq for both iterations, since I didn't actually do anything to change it. I believe that flodel is definitely right that my problem lies in comparing a matrix (details.table) of length longer than rich.seq, leading to unexpected results. However, I don't want the dimensions of details.table to change. Perhaps I can solve the problem implementing %in% somehow when I redefine rich.seq in the for loop?
I agree you should improve your question. However, I think I can spot what is going wrong.
You compute details.table before the for loop. It is a matrix with same length as rich.seq when it was first initialized (length(4:34), i.e. 31).
Inside the for loop, details.table < desired.iterations | is.na(details.table) is then a logical vector of length 31. On the first loop iteration,
rich.seq <- rich.seq[details.table < desired.iterations | is.na(details.table)]
will result in reducing the length of rich.seq. But on the second loop iteration, unless details.table is redefined (not the case), you are trying to subset rich.seq by a logical vector of longer length than rich.seq. This will certainly lead to unexpected results.
You probably meant to redefine details.table as part of your for loop.
(Also I am surprised to see you never used temp.results[,2].)
Thanks to flodel for setting me off on the right track. It had nothing to do with is.na but rather the lengths of vectors I was comparing.
That said, I set the initial values of the details.table to zero to avoid the added complexity of the is.na statement.
This code works, and can be modified to do what I described above.
library(plyr)
test <- function(desired.iterations, max.iterations)
{
rich.seq <- 4:34 ##make a sequence of numbers
details.table <- matrix(nrow=length(rich.seq), ncol=1, dimnames=list(rich.seq)) ##generate a table where the row names are those numbers
details.table[,1] <- 0
print(details.table) ##that's what it looks like
temp.results <- matrix(nrow=10, ncol=2, dimnames=list(1:10)) ##generate some sample data to summarize and fill into details.table
temp.results[,1] <- rep(5:6, 5)
temp.results[,2] <- rnorm(10)
print(temp.results) ##that's what it looks like
details.table[,1][row.names(details.table) %in% count(temp.results[,1])$x] <- count(temp.results[,1])$freq ##summarize, subset to the appropriate rows in details.table, and fill in the summary
print(details.table)
for (i in 1:max.iterations)
{
rich.seq <- row.names(details.table)[details.table[,1] < desired.iterations]
print(rich.seq)
}
}
Rather than trying to cut down the rich.seq I just redefine it every iteration based on whatever happens with details.table during the previous iteration.