How to create a loop calculating the summation of exponents - r

I'm trying to create a loop that will evaluate this equation.
10
y = ∑X^j
j=0
When x = 5. I am trying to use this code
y=0 # initialize y to 0
x = 5
for(i in 1:5){y[i] = (exp(x[0:10]))}
print(y)
but I can't seem to even get the exponents right, let alone the summation. Anyone know how to use a for loop to evaluate this sum?

The code is mixing a for loop with a sequence which is likely not going to produce the results you want. Also, the error that "number of items to replace is not a multiple of replacement length" shows there is a problem with the sequence and trying to index a single value.
x <- 5
y <- 0
for (i in 0:10) {
y <- y + x ^ i
}
Comparing the results to the most succint way listed above shows the results are the same.
> setequal(y, sum(x^(0:10)))
[1] TRUE

Related

R outer function not inserting elements as arguments

I am working on a script that should estimate the probability of having at least 2 out of n people having a same birthday within k days from eachother. To estimate this I have the following function:
birthdayRangeCheck.prob = function(nPeople, seperation, nSimulations) {
count = 0
for (i in 1:nSimulations) {
count = count + birthdayRangeCheck(nPeople, seperation)
}
return(count / nSimulations)
}
Now just entering simple values for nPeople, seperation, nSimulations gives me a normal number.
e.g.
birthdayRangeCheck.prob(10,4,100)
-> 0.75
However when I want to plot the probability as a function of nPeople, and seperation I stumble upon the following problem:
x = 1:999
y = 0:998
z = outer(X = x, Y = y, FUN = birthdayRangeCheck.prob, nSimulations = 100)
numerical expression has 576 elements: only the first used... (a lot of times)
So it seems like outer is not entering single elements of x and y, but rather the vectors themselfs, which is the opposite of what outer should do right?
Am I overlooking something? Because I can't figure out what is causing this error. (replacing FUN with e.g. sin(x+y) works like a charm so I did pin it down to the function itself. But since the function works just fine with numeric arguments I don't see why R doesn't understand to just enter elements of x and y as arguments.)
Any help would be greatly appreciated. Thanks ;)

Data frames using conditional probabilities to extract a certain range of values

I would like some help answering the following question:
Dr Barchan makes 600 independent recordings of Eric’s coordinates (X, Y, Z), selects the cases where X ∈ (0.45, 0.55), and draws a histogram of the Y values for these cases.
By construction, these values of Y follow the conditional distribution of Y given X ∈ (0.45,0.55). Use your function sample3d to mimic this process and draw the resulting histogram. How many samples of Y are displayed in this histogram?
We can argue that the conditional distribution of Y given X ∈ (0.45, 0.55) approximates the conditional distribution of Y given X = 0.5 — and this approximation is improved if we make the interval of X values smaller.
Repeat the above simulations selecting cases where X ∈ (0.5 − δ, 0.5 + δ), using a suitably chosen δ and a large enough sample size to give a reliable picture of the conditional distribution of Y given X = 0.5.
I know for the first paragraph we want to have the values generated for x,y,z we got in sample3d(600) and then restrict the x's to being in the range 0.45-0.55, is there a way to code (maybe an if function) that would allow me to keep values of x in this range but discard all the x's from the 600 generated not in the range? Also does anyone have any hints for the conditional probability bit in the third paragraph.
sample3d = function(n)
{
df = data.frame()
while(n>0)
{
X = runif(1,-1,1)
Y = runif(1,-1,1)
Z = runif(1,-1,1)
a = X^2 + Y^2 + Z^2
if( a < 1 )
{
b = (X^2+Y^2+Z^2)^(0.5)
vector = data.frame(X = X/b, Y = Y/b, Z = Z/b)
df = rbind(vector,df)
n = n- 1
}
}
df
}
sample3d(n)
Any help would be appreciated, thank you.
Your function produces a data frame. The part of the question that asks you to find those values in a data frame that are in a given range can be solved by filtering the data frame. Notice that you're looking for a closed interval (the values aren't included).
df <- sample3d(600)
df[df$X > 0.45 & df$X < 0.55,]
Pay attention to the comma.
You can use a dplyr solution as well, but don't use the helper between(), since it will look at an open interval (you need a closed interval).
filter(df, X > 0.45 & X < 0.55)
For the remainder of your assignment, see what you can figure out and if you run into a specific problem, stack overflow can help you.

How to plot function with 2 variables and involving factorials in R

I could not find a viable solution to this problem (and I am a beginner in R).
I have an equation as shown below
where n and K are constants. a and b are the variables.
How do I generate a 2-dimensional plot for the above in R?
Thanks in advance.
factorialfunction <-function(a,b, n, K){
K*(b^a)*((2+b)^(n+a))
}
Klist = c(1,5,10,50,100,200)
nlist = c(1,5,10,50,100,200)
#note that the n and K values are recycled, make them whatever you wish, they are constants,
#while a and b take on any values, here 100 values between zero and one
res <- mapply(factorialfunction,a = seq(.01,1,by=.01),
b=seq(.01,1,by=.01), n = rep(nlist,each = 100), K=rep(Klist, each=100))
#Then you can plot this six times.
#allow six plots on the panel if you want
par(mfrow = c(3,2))
#loop through different plots
for (i in 1:6)
plot(1:100,res[1:100 + (i-1)*100])
Note In this code I chose a and b to be between zero and one, I am not familiar with this function but It looks like some type of Beta.
You can generate more than 6 plots by changing klist and nlist and your par and for loop parameters.
Here is what you get, note this code is customizable to produce the plots for the values of n, K, a, and b that you want.

if x>0 use f(x) ,if x<0 use g(x) and plot all y in one plot

how would I write a function which chooses between two functions depending on whether the argument is larger or smaller zero and writes the generated values into a y vector so that one can plot(x,y). Can you please tell me whether this is the right ansatz or not:
x <- runif(20,-20,20)
y <- numeric()
f <- function(x){
if(x>0){y <- c(y,x^2)}
if(x<0){y <- c(y,x^3)}
}
for(i in x){f(x)}
plot(x,y)
As I can see, there are few problems with your code:
Your function f does not return any meaningful values. The assignment to y
is not global and remains in the scope of the function.
Many operations in R are vectorized (i.e. they are performed on the whole vectors instead of individual elements), and this is an important feature of a good R code, but your code does not take advantage of that. To give a gist, when you do x > 0, and x is a vector, it will return a boolean vector where condition is checked for every element of x. When you do x^2, it returns a numeric vector where every element is a square of the corresponding element in x. ifelse is also a vectorized operator, so it also checks the condition for every element in the vector. By knowing that, you can get rid of your function and loop and do y <- ifelse(x<0,x^3,x^2).
You don't need the loop, and in this case it is not even necessary to define f(x) and to store the output in a vector y.
Something like
x <- runif(20,-20,20)
plot(x,ifelse(x<0,x^3,x^2))
should do. The second argument can take several versions, as discussed in the comments.
If you want to store the function and the data for later use, try
x <- runif(20,-20,20)
f <- function(x) ifelse (x < 0, x^3, x^2)
y <- f(x)
plot (x,y)
Here is another option...
x <- runif(20,-20,20)
y <- x^2*(x>0)+x^3*(x<=0)
plot(x,y)
R will interpret (x > 0) as a boolean (T or F) and then automatically transform it to 1 and 0.

questions on Vectorize of R

I have a question regarding the following R code segment
n <- 20
theta <- 5
x <- runif(n)
y <- x * theta + rnorm(n)
empirical.risk <- function(b) {
mean((y-b*x)^2)
}
true.risk <- function(b) {
1 + (theta - b)^2 * (0.5^2 + 1 / 12)
}
curve(Vectorize(empirical.risk)(x), from = 0, to = 2 * theta,
xlab = "regression slope", ylab = "MSE risk")
curve(true.risk, add = TRUE, col = "grey")
This code segment makes use of the Vectorize but I do not quite understand how it works. Especially, in curve(true.risk, add = TRUE, col = "grey") even no parameters are passed to true.risk
So first, the curve(...) function works by forming a vector of length n (default n=101) of values between from and to (0 and 10 in your example). Then it passes that vector to the function defined in the first argument and plots the returned vector. So the first argument to curve(...) has to be "vectorized"; in other words it has to take a vector as an argument and return a vector of the same length.
Your function, empirical.risk(...) is not vectorized: if you pass it a scalar, you get what you would expect, but if you pass it a vector (e.g., multiple values of b), you get back a scalar. The reason is actually quite subtle and has to do with the use of (y-b*x). If you pass a vector, R tries to form the product of the vector b and the vector x to create a new vector, which it then subtracts from the vector y. Then it takes the mean of that vector and returns a single value. What you want is a vector that is the result of running empirical.risk(...) on all the values of b in succession. This is what Vectorize does.
Another way to think about it is that Vectorize(...) is equivalent to wrapping your function with sapply(...)
f <- Vectorize(empirical.risk)
f(1:5)
# [1] 5.663989 3.660942 2.279632 1.520057 1.382219
sapply(1:5,empirical.risk)
# [1] 5.663989 3.660942 2.279632 1.520057 1.382219
Regarding the second part of your question, why the last line of code works: when add=T, curve(...) gets it's defaults from the existing plot. So, since you defined the vector x in that plot (using from and to and the default n), the second call to curve(...) uses that.

Resources