The sum of the first n odd integers - r

I am trying to create a function that takes the sum of the first n odd integers, i.e the summation from i=1 to n of (2i-1).
If n = 1 it should output 1
If n = 2 it should output 4
I'm having problems using a for loop which only outputs the nth term
n <-2
for (i in 1:n)
{
y<-((2*i)-1)
}
y

In R programming we try avoiding for loops
cumsum ( seq(1,2*n, by=2) )
Or just use 'sum' if you don't want the series of partial sums.

There's actually no need to use a loop or to construct the sequence of the first n odd numbers here -- this is an arithmetic series so we know the sum of the first n elements in closed form:
sum.first.n.odd <- function(n) n^2
sum.first.n.odd(1)
[1] 1
sum.first.n.odd(2)
[1] 4
sum.first.n.odd(100)
[1] 10000
This should be a good deal more efficient than any solution based on for or sum because it never computes the elements of the sequence.
[[Just seeing the title -- the OP apparently knows the analytic result and wanted something else...]]

Try this:
sum=0
n=2
for(i in seq(1,2*n,2)){
sum=sum+i
}
But, of course, R is rather slow when working with loops. That's why one should avoid them.

Related

Finding the Proportion of a specific difference between the average of two vectors

I have a question for an assignment I'm doing.
Q:
"Set the seed at 1, then using a for-loop take a random sample of 5 mice 1,000 times. Save these averages.
What proportion of these 1,000 averages are more than 1 gram away from the average of x ?"
I understand that basically, I need to write a code that says: What percentage of "Nulls" is +or- 1 gram from the average of "x." I'm not really certain how to write that given that this course hasn't given us the information on how to do that yet is asking us to do so. Any help on how to do so?
url <- "https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/femaleControlsPopulation.csv"
filename <- basename(url)
download(url, destfile=filename)
x <- unlist( read.csv(filename) )
set.seed(1)
n <- 1000
nulls<-vector("numeric", n)
for(i in 1:n){
control <- sample(x, 5)
nulls[i] <-mean(control)
##I know my last line for this should be something like this
## mean(nulls "+ or - 1")> or < mean(x)
## not certain if they're asking for abs() to be involved.
## is the question asking only for those that are 1 gram MORE than the avg of x?
}
Thanks for any help.
Z
I do think that the absolute distance is what they're after here.
Vectors in R are nice in that you can just perform arithmetic operations between a vector and a scalar and it will apply it element-wise, so computing the absolute value of nulls - mean(x) is easy. The abs function also takes vectors as arguments.
Logical operators (such as < and >) can also be used in the same way, making it equally simple to compare the result with 1. This will yield a vector of booleans (TRUE/FALSE) where TRUE means the value at that index was indeed greater than 1, but booleans are really just numbers (1 or 0), so you can just sum that vector to find the number of TRUE elements.
I don't know what programming level you are on, but I hope this helps without giving the solution away completely (since you said it's for an assignment).

How to calculate sum over term including rising factorial?

I am new to programming and R and would like to compute the following sum
I used the pochMpfr from the Rmpfr package for the rising factorial and a for loop in order compute the sum.
B=rep(1,k+1)
for (i in 0:k) {
B[(i+1)]= (-1)^i *choose(k,i)*pochMpfr((-i)*sigma, n)
}
sum(B)
Doing so, I get the results as list (including always: mpfr) and thus cannot compute the sum.
Is there a possibility to get the results immediately as a Matrix or to convert the list to vector including only the relevant Elements?
The solution is probably quite easy but I haven't found it while looking through the forums.
There is no need to use a for-loop, this should work:
library(Rmpfr)
# You do not define these in your question,
# so I just take some arbitrary values
k <- 10
n <- 3
sigma <- 0.3
i <- 0:k
B <- (-1)^i *choose(k,i)*pochMpfr((-i)*sigma, n)
sum(B)
## 1 'mpfr' number of precision 159 bits
## [1] 6.2977401071861993597462780570563107354142915151e-14

counting matching elements of two vectors but not including repeated elements in the count

I've search a lot in this forum. However, I didn't found a similar problem as the one I'm facing.
My question is:
I have two vectors
x <- c(1,1,2,2,3,3,3,4,4,4,6,7,8) and z <- c(1,1,2,4,5,5,5)
I need to count the number of times x or z appears in each other including if they are repeated or not.
The answer for this problem should be 4 because :
There are two number 1, one number 2, and one number 4 in each vector.
Functions like match() don't help since they will return the answer of repeated for non repeated numbers. Using unique() will also alter the final answer from 4 to 3
What I came up with was a loop that every time it found one number in the other, it would remove from the list so it won't be counted again.
The loop works fine for this size of this example; however, searching for larger vectors numerous times makes my loop inefficient and too slow for my purposes.
system.time({
for(n in 1:1000){
x <- c(1,1,2,2,3,3,3,4,4,4,6,7,8)
z <- c(1,1,2,4,5,5,5)
score <- 0
for(s in spectrum){
if(s %in% sequence){
sequence <- sequence[-which(sequence==s)[1]]
score <- score + 1
}
}
}
})
Can someone suggest a better method?
I've tried using lapply, for short vectors it is faster, but it became slower for longer ones..
Use R's vectorization to your advantage here. There's no looping necessary.
You could use a table to look at the frequencies,
table(z[z %in% x])
#
# 1 2 4
# 2 1 1
And then take the sum of the table for the total
sum(table(z[z %in% x]))
# [1] 4

How to improve performance on counting columns in a matrix which are below a threshold?

In my code I am subtracting one column of a matrix from every other column of the same matrix.
Then I count how many of the new columns have only elements that are smaller than r.
I'm doing this for each column of the matrix. You can see my code below. I left out the part where I put values into the matrix.
Is there any way to improve the performance of this code? I can't seem to figure out a way to make this faster
B = matrix(NA,(m),(window_step))
B_m_r = c(1:(window_step))
for (i in 1:(window_step)){
B_m_r[i] = sum(apply(abs(B[,-i]-B[,i]), 2,function(x) max(x) < r))
}
Solution
B = matrix(NA,(m),(window_step))
B_m_r = c(1:(window_step))
buffer_B = matrix(NA,(window_step-1),(window_step-1))
for (i in 1:(window_step-2)){
buffer_B[i,c(i:(window_step-1))] = apply(abs(B[,-c(1:i)]-B[,i]),2,function(x) max(x) < r)
B_m_r[i] = (sum(buffer_B[i,c(i:(window_step-1))])+sum(buffer_B[1:i,i]))
}
B_m_r[window_step] = sum(buffer_B[1:(window_step-1),(window_step-1)])
B_m_r[window_step-1] = sum(buffer_B[1:(window_step-2),(window_step-2)])
Ok so based on the help from Яaffael I found a solution, that doesn't calculate the differences twice.
Instead I save the result of the comparison with r from previous loops in the matrix buffer_B and use them for the next loop to calculate the sum of all columns who are smaller than r.
Now the code takes only half the time to finish.
Thanks!
You can for example reduce the calculation time by 50% by only checking "< r" for half of the column differences because they are effectively symmetric.
You are calculating abs(first of B - last of B) and abs(last of B - first of B).
PLUS you can precalculate the handled difference matrix instead of using a for loop to set it up step by step.
# I am using single-row matrices to keep it simple
> A <- matrix(1:4,ncol=4)
> A[,1:ceiling(ncol(A)/2)]
[1] 1 2
> A[,ncol(A):(floor(ncol(A)/2)+1)]
[1] 4 3
> A <- matrix(1:5,ncol=5)
> A[,1:ceiling(ncol(A)/2)]
[1] 1 2 3
> A[,ncol(A):(floor(ncol(A)/2)+1)]
[1] 5 4 3
> abs(A[,1:ceiling(ncol(A)/2)] - A[,ncol(A):(floor(ncol(A)/2)+1)])
[1] 4 2 0
When you want to speed up code in R then first thing you should try is to turn all loops into vectorized expressions using R functions. A loop will run within R. Vectorized function calls allow R to execute essentially compiled C code.

optimizing markov chain transition matrix calculations?

As an intermediate R user, I know that for loops can very often be optimized by using functions like apply or otherwise. However, I am not aware of functions that can optimize my current code to generate a markov chain matrix, which is running quite slowly. Have I max-ed out on speed or are there things that I am overlooking? I am trying to find the transition matrix for a Markov chain by counting the number of occurrences in 24-hour time periods before given alerts. The vector ids contains all possible id's (about 1700).
The original matrix looks like this, as an example:
>matrix
id time
1 1376084071
1 1376084937
1 1376023439
2 1376084320
2 1372983476
3 1374789234
3 1370234809
And here is my code to try to handle this:
matrixtimesort <- matrix[order(-matrix$time),]
frequency = 86400 #number of seconds in 1 day
# Initialize matrix that will contain probabilities
transprobs <- matrix(data=0, nrow=length(ids), ncol=length(ids))
# Loop through each type of event
for (i in 1:length(ids)){
localmatrix <- matrix[matrix$id==ids[i],]
# Loop through each row of the event
for(j in 1:nrow(localmatrix)) {
localtime <- localmatrix[j,]$time
# Find top and bottom row number defining the 1-day window
indices <- which(matrixtimesort$time < localtime & matrixtimesort$time >= (localtime - frequency))
# Find IDs that occur within the 1-day window
positiveids <- unique(matrixtimesort[c(min(indices):max(indices)),]$id)
# Add one to each cell in the matrix that corresponds to the occurrence of an event
for (l in 1:length(positiveids)){
k <- which(ids==positiveids[l])
transprobs[i,k] <- transprobs[i,k] + 1
}
}
# Divide each row by total number of occurrences to determine probabilities
transprobs[i,] <- transprobs[i,]/nrow(localmatrix)
}
# Normalize rows so that row sums are equal to 1
normalized <- transprobs/rowSums(transprobs)
Can anyone make any suggestions to optimize this for speed?
Using nested loops seems a bad idea. Your code can be vectorized to speed up.
For example, why find the top and bottom of row numbers? You can simply compare the time value with "time_0 + frequency": it is a vectorized operation.
HTH.

Resources