Beginner in R - Writing loops and functions - r

I am new to R (I only started learning a few weeks ago) and I need some help with a question on my homework. I have asked my professor several times already and mostly that's resulted in me being more confused.
The question is as follows:
Recall that the Taylor expansion of log(1+x) is:
log(x+1) = sum((-1)^(i+1)*x^i/i)
How many terms do you need to get within 10^(−6) of the correct solution when x = 0.99?
My professor gave me some hints:
n should not be equal to length of x. n should be figured out from the loop. It is the iteration number of the last loop.
the condition of the while loop should depend on the difference between the approximated value and the true value of the function. Not on n any more.
in each iteration, you need to calculate (-1)^(i+1)*x^2/i and add it to the sum of the previous iterations so that you accumulate the sum. Check the example about factorial function in class.
You should include your while loop in a function. The function should have a single argument x.
I'm not asking for anyone to give me the answer, but I would appreciate any help or advice. Thank you!
Below is some code of what I've tried. It is incomplete still and I'm not sure what is missing:
taylorexp <- function(x){
i = 1
approximation = 1
while((log(1+x) - approximation) > 10^(-6)){
z=sum((-1)^(i+1)*x^i/i)
i=i+1
}
return(i)
}
taylorexp(0.99)

Here is an example with repeat
s <- 0
i <- 1
x <- 0.99
repeat{
if (abs(log(1 + x) - s) <= 1e-6) break
s <- s + (-1)**(i + 1) * x**i / i
i <- i + 1
}
or with while
s <- 0
i <- 1
x <- 0.99
while(abs(log(1 + x) - s) > 1e-6) {
s <- s + (-1)**(i + 1) * x**i / i
i <- i + 1
}
and we will get
> i
[1] 661
Update
You can wrap all the required information in a list as the output of function taylorexp, e.g.,
taylorexp <- function(x, tol) {
s <- 0
i <- 1
while (abs(log(1 + x) - s) > tol) {
s <- s + (-1)**(i + 1) * x**i / i
i <- i + 1
}
list(approx = s, err = log(1 + x) - s, n = i)
}
such that
> taylorexp(0.99,1e-6)
$approx
[1] 0.6881336
$err
[1] 9.911419e-07
$n
[1] 661
> taylorexp(0.99,1e-7)
$approx
[1] 0.6881345
$err
[1] 9.966883e-08
$n
[1] 863

Answer Update:
taylor = function(x,n){
out = 0
for(i in 1:n){
out = out + ((-1)^(i+1))*(x^i/i)
}
out
}
true_val = log(1 + 0.99)
true_val
approx_val = taylor(0.99, 10)
approx_val

Related

Area Under the Curve using Simpson's rule in R

I would like to compute the Area Under the Curve defined by a set of experimental values. I created a function to calculate an aproximation of the AUC using the Simpson's rule as I saw in this post. However, the function only works when it receives a vector of odd length. How can I modify the code to add the area of the last trapezoid when the input vector has an even length.
AUC <- function(x, h=1){
# AUC function computes the Area Under the Curve of a time serie using
# the Simpson's Rule (numerical method).
# https://link.springer.com/chapter/10.1007/978-1-4612-4974-0_26
# Arguments
# x: (vector) time serie values
# h: (int) temporal resolution of the time serie. default h=1
n = length(x)-1
xValues = seq(from=1, to=n, by=2)
sum <- list()
for(i in 1:length(xValues)){
n_sub <- xValues[[i]]-1
n <- xValues[[i]]
n_add <- xValues[[i]]+1
v1 <- x[[n_sub+1]]
v2 <- x[[n+1]]
v3 <- x[[n_add+1]]
s <- (h/3)*(v1+4*v2+v3)
sum <- append(sum, s)
}
sum <- unlist(sum)
auc <- sum(sum)
return(auc)
}
Here a data example:
smoothed = c(0.3,0.317,0.379,0.452,0.519,0.573,0.61,0.629,0.628,0.613,0.587,0.556,0.521,
0.485,0.448,0.411,0.363,0.317,0.273,0.227,0.185,0.148,0.12,0.103,0.093,0.086,
0.082,0.079,0.076,0.071,0.066,0.059,0.053,0.051,0.052,0.057,0.067,0.081,0.103,
0.129,0.165,0.209,0.252,0.292,0.328,0.363,0.398,0.431,0.459,0.479,0.491,0.494,
0.488,0.475,0.457,0.43,0.397,0.357,0.316,0.285,0.254,0.227,0.206,0.189,0.181,
0.171,0.157,0.151,0.162,0.192,0.239)
One recommended way to handle an even number of points and still achieve precision is to combine Simpson's 1/3 rule with Simpson's 3/8 rule, which can handle an even number of points. Such approaches can be found in (at least one or perhaps more) engineering textbooks on numerical methods.
However, as a practical matter, you can write a code chunk to check the data length and add a single trapezoid at the end, as was suggested in the last comment of the post to which you linked. I wouldn't assume that it is necessarily as precise as combining Simpson's 1/3 and 3/8 rules, but it is probably reasonable for many applications.
I would double-check my code edits below, but this is the basic idea.
AUC <- function(x, h=1){
# AUC function computes the Area Under the Curve of a time serie using
# the Simpson's Rule (numerical method).
# https://link.springer.com/chapter/10.1007/978-1-4612-4974-0_26
# Arguments
# x: (vector) time serie values
# h: (int) temporal resolution of the time serie. default h=1
#jh edit: check for even data length
#and chop off last data point if even
nn = length(x)
if(length(x) %% 2 == 0){
xlast = x[length(x)]
x = x[-length(x)]
}
n = length(x)-1
xValues = seq(from=1, to=n, by=2)
sum <- list()
for(i in 1:length(xValues)){
n_sub <- xValues[[i]]-1
n <- xValues[[i]]
n_add <- xValues[[i]]+1
v1 <- x[[n_sub+1]]
v2 <- x[[n+1]]
v3 <- x[[n_add+1]]
s <- (h/3)*(v1+4*v2+v3)
sum <- append(sum, s)
}
sum <- unlist(sum)
auc <- sum(sum)
##jh edit: add trapezoid for last two data points to result
if(nn %% 2 == 0){
auc <- auc + (x[length(x)] + xlast)/2 * h
}
return(auc)
}
sm = smoothed[-length(smoothed)]
length(sm)
[1] 70
#even data as an example
AUC(sm)
[1] 20.17633
#original odd data
AUC(smoothed)
[1] 20.389
There may be a good reason for you to prefer using Simpson's rule, but if you're just looking for a quick and efficient estimate of AUC, the trapezoid rule is far easier to implement, and does not require an even number of breaks:
AUC <- function(x, h = 1) sum((x[-1] + x[-length(x)]) / 2 * h)
AUC(smoothed)
#> [1] 20.3945
Here, I show example code that uses the Simpson's 1/3 and 3/8 rules in tandem for the numerical integration of data. As always, the usual caveats about the possibility of coding errors or compatibility issues apply.
The output at the end compares the numerical estimates of this algorithm with the trapezoidal rule using R's "integrate" function.
#Algorithm adapted from:
#Numerical Methods for Engineers, Seventh Edition,
#By Chapra and Canale, page 623
#Modified to accept data instead of functional values
#Modified by: Jeffrey Harkness, M.S.
##Begin Simpson's rule function code
simp13 <- function(dat, h = 1){
ans = 2*h*(dat[1] + 4*dat[2] + dat[3])/6
return(ans)}
simp13m <- function(dat, h = 1){
summ <- dat[1]
n <- length(dat)
nseq <- seq(2,(n-2),2)
for(i in nseq){
summ <- summ + 4*dat[i] + 2*dat[i+1]}
summ <- summ + 4*dat[n-1] + dat[n]
result <- (h*summ)/3
return(result)}
simp38 <- function(dat, h = 1){
ans <- 3*h*(dat[1] + 3*sum(dat[2:3]) + dat[4])/8
return(ans)}
simpson = function(dat, h = 1){
hin = h
len = length(dat)
comp <- len %% 2
##number of segments
if(len == 2){
ans = sum(dat)/2*h} ##n = 2 is the trapezoidal rule
if(len == 3){
ans = simp13(dat, h = hin)}
if(len == 4){
ans = simp38(dat,h = hin)}
if(len == 6){
ans <- simp38(dat[1:4],h = hin) + simp13(dat[4:len],h = hin)}
if(len > 6 & comp == 0){
ans = simp38(dat[1:4],h = hin) + simp13m(dat[4:len],h = hin)}
if(len >= 5 & comp == 1){
ans = simp13m(dat,h = hin)}
return(ans)}
##End Simpson's rule function code
This next section of code shows the performance comparison. This code can easily be altered for different test functions and cases.
The precision difference tends to change with the sample size and test function used; this example is not intended to imply that the difference is always this pronounced.
#other algorithm for comparison purposes, from Allan Cameron above
oa <- function(x, h = 1) sum((x[-1] + x[-length(x)]) / 2 * h)
#Testing and algorithm comparison code
simans = NULL; oaans = NULL; simerr = NULL; oaerr = NULL; mp = NULL
for( j in 1:10){
n = j
#f = function(x) cos(x) + 2 ##Test functions
f = function(x) 0.2 + 25*x - 200*x^2 + 675*x^3 - 900*x^4 + 400*x^5
a = 0;b = 10
h = (b-a)/n
datain = seq(a,b,by = h)
preans = integrate(f,a,b)$value #precise numerical estimate of test function
simans[j] = simpson(f(datain), h = h)
oaans[j] = oa(f(datain), h = h)
(simerr[j] = abs(simans[j] - preans)/preans * 100)
(oaerr[j] = abs(oaans[j] - preans)/preans * 100)
mp[j] = simerr[j] < oaerr[j]
}
(outframe = data.frame("simpsons percent diff" = simerr,"trapezoidal percent diff" = oaerr, "more precise?" = mp, check.names = F))
simpsons percent diff trapezoidal percent diff more precise?
1 214.73489738 214.734897 FALSE
2 15.07958148 64.993410 TRUE
3 6.70203621 29.816799 TRUE
4 0.94247384 16.955208 TRUE
5 0.54830021 10.905620 TRUE
6 0.18616767 7.593825 TRUE
7 0.12051767 5.588209 TRUE
8 0.05890462 4.282980 TRUE
9 0.04087107 3.386525 TRUE
10 0.02412733 2.744500 TRUE

If else (set maximum to end at a set value)

How can I set a loop to run to a maximum value (Dend)?
I just want to see how fast and deep it will grow but I want to set a maximum to say that it can't grow beyond Dend.
I get an error stating
In if (D == Dend) { :
the condition has length > 1 and only the first element will be used
Code
D0 <- 0
Dend <- 4200
r <- 5 growth rate
days <- 1000
n_steps <- days*1
D <- rep(NA, n_steps+1)
D <- D0
for (time in seq_len(n_steps)){
if (D == Dend){
break} else
D[time + 1] <- r + D[time]
}
D
plot(-D, las=1)
If you want a for loop, it might be something like below
for (time in seq_len(n_steps)){
if (tail(D,1) >= Dend) break
D[time + 1] <- r + D[time]
}
I think what you want can be achieved with seq without any loops :
D <- seq(D0, Dend, r)
If you have to use for loop you can use :
for (time in seq_len(n_steps)){
temp <- r + D[time]
if (temp >= Dend) break
D[time + 1] <- temp
}
We can also use a while loop :
i <- 1
while(TRUE) {
temp <- r + D[i]
if(temp > Dend) break
i <- i + 1
D[i] <- temp
}

trying to reverse complicated function in R

myfnS <- function(S,T,pH,D,f,r,I0) {
f1 <- 0.78*sqrt(S/35)*exp(T/26)
f2 <- 42*exp(T/17)
alpha <- 0.106*((f1*f^2)/(f1^2 + f^2))*exp((pH - 8)/0.56) + 0.52*(1 + T/43)*(S/35)*((f2*f^2)/(f2^2 + f^2))*exp(-D/6) + 0.00049*f^2*exp(-(T/27 + D/16))
TLsph <- (20*log(r*1000, base = 10))
I <- I0 - TLsph - ((alpha)*r)
return(I)
}
I'm trying to plug in I, to get a value for r but I'm not sure how to do that.
values for other things remain fixed;
for example for (I=50, S=34, T=10.5, pH-8.1, D=0.0395, f=10.3, I0=192) - how do i find the value of r?
In R there's no way that I know of to run a function in reverse. However, you could use something similar to Newton's method to zero in on a value of r that gives you an I of 50.
First though, the function needs to be cleaned up. Using T as an argument is a problem since that's also short for TRUE, so I changed it to tt. I also added in * operators where implied before.
myfnS <- function(S,tt,pH,D,f,r,I0) {
f1 <- 0.78*sqrt(S/35)*exp(tt/26)
f2 <- 42*exp(tt/17)
alpha <- 0.106*((f1*f^2)/(f1^2 + f^2))*exp((pH - 8)/0.56) +
0.52*(1 + (tt)/43)*(S/35)*((f2*f^2)/(f2^2 + f^2))*exp(-D/6) +
0.00049*f^2*exp(-(tt/27 + D/16))
TLsph <- (20*log(r*1000, base = 10))
I <- I0 - TLsph - ((alpha)*r)
return(I) }
Then a way to iterate through values of r to get closer to an I of 50. As r goes up, I goes down. Once a value for r is found that gets within 1 of 50, r is returned. This threshold can be adjusted to get a more precise guess.
iterateR <- function(I=50, r=1){
if (abs(I - myfnS(r, S=34, tt=10.5, pH=8.1, D=0.0395, f=10.3, I0=192)) < 1) {
return(r)
}
else if(I - myfnS(r, S=34, tt=10.5, pH=8.1, D=0.0395, f=10.3, I0=192) > 1) {
iterateR(r=r-1)
}
else if(I - myfnS(r, S=34, tt=10.5, pH=8.1, D=0.0395, f=10.3, I0=192) < 1) {
iterateR(r=r+1)
}
}
iterateR(I=50, r=1)
#> 47
myfnS(r=47, S=34, tt=10.5, pH=8.1, D=0.0395, f=10.3, I0=192)
#> 50.8514607488626
An r of 47 gives an I of 50.85

R: How can I calculate large numbers in n-choose-k? [duplicate]

This question already has answers here:
How would you program Pascal's triangle in R?
(2 answers)
How to work with large numbers in R?
(1 answer)
Closed 6 years ago.
For a class assignment, I need to create a function that calculates n Choose k. I did just that, and it works fine with small numbers (e.g. 6 choose 2), but I'm supposed to get it work with 200 choose 50, where it naturally doesn't.
The answer is too large and R outputs NaN or Inf, saying:
> q5(200, 50)
[1] "NaN"
Warning message:
In factorial(n) : value out of range in 'gammafn'
I tried using logs and exponents, but it doesn't cut it.
q5 <- function (n, k) {
answer <- log(exp( factorial(n) / ( (factorial(k)) * (factorial(n - k)) )))
paste0(answer)
}
The answer to the actual question is that R cannot show numbers it cannot represent, and some of the terms in your equation are too big to represent. So it fails. However there are approximations to factorial that can be used - they work with logarithms which get big a lot slower.
The most famous one, Sterling's approximation, was not accurate enough, but the Ramanujan's approximation came to the rescue :)
ramanujan <- function(n){
n*log(n) - n + log(n*(1 + 4*n*(1+2*n)))/6 + log(pi)/2
}
nchoosek <- function(n,k){
factorial(n)/(factorial(k)*factorial(n-k))
}
bignchoosek <- function(n,k){
exp(ramanujan(n) - ramanujan(k) - ramanujan(n-k))
}
nchoosek(20,5)
# [1] 15504
bignchoosek(20,5)
# [1] 15504.06
bignchoosek(200,50)
# [1] 4.538584e+47
You can try this too:
q5 <- function (n, k) {
# nchoosek = (n-k+1)(n-k+2)...n / (1.2...k)
return(prod(sapply(1:k, function(i)(n-k+i)/(i))))
}
q5(200, 50)
#[1] 4.538584e+47
or in log domain
q5 <- function (n, k) {
# ln (nchoosek) = ln(n-k+1) + ln(n-k+2) + ...+ ln(n) - ln(1) -ln(2) - ...- ln(k)
return(exp(sum(sapply(1:k, function(i)(log(n-k+i) - log(i))))))
}
q5(200, 50)
#[1] 4.538584e+47
The packages for large numbers:
Brobdingnag package for "Very large numbers in R":
https://cran.r-project.org/web/packages/Brobdingnag/index.html
Paper: https://www.researchgate.net/publication/251996764_Very_large_numbers_in_R_Introducing_package_Brobdingnag
library(Brobdingnag)
googol <- as.brob(10)^100 # googol:=10^100
googol
# [1] +exp(230.26) # exponential notation is convenient for very large numbers
gmp package for multiple Precision Arithmetic (big integers and rationals, prime number tests, matrix computation):
https://cran.r-project.org/web/packages/gmp/index.html
This solution calculates the complete row of the Pascal triangle:
x <- 1
print(x)
for (i in 1:200) { x <- c(0, x) + c(x, 0); print(x) }
x[51] ### 200 choose 50
## > x[51]
## [1] 4.538584e+47
(as I proposed for How would you program Pascal's triangle in R? )
If you want to speed up the code then do not the print(x) (output is a relative slow operation).
To put the code in a function we can do
nchoosek <- function(n,k) {
x <- 1
for (i in 1:n) x <- c(0, x) + c(x, 0)
x[k+1] ### n choose k
}
nchoosek(200, 50) ### testing the function
## [1] 4.538584e+47
Here is a more refined version of my function:
nchoosek <- function(n, k) {
if (k==0) return(1)
if (k+k > n) k <- n-k
if (k==0) return(1)
x <- 1
for (i in 1:k) x <- c(0, x) + c(x, 0)
for (i in 1:(n-k)) x <- x + c(0, head(x, -1))
tail(x, 1)
}
nchoosek(200, 50) ### testing the function
## [1] 4.538584e+47

Newton's Method in R

I have an issue when trying to implement the code for Newton's Method for finding the value of the square root (using iterations). I'm trying to get the function to stop printing the values once a certain accuracy is reached, but I can't seem to get this working. Below is my code.
MySqrt <- function (x, eps = 1e-6, itmax = 100, verbose = TRUE){
i <- 1
myvector <- integer(0)
GUESS <- readline(prompt="Enter your guess: ")
GUESS <- as.integer(GUESS)
while(i <= itmax){
GUESS <- (GUESS + (x/GUESS)) * 0.5
myvector <- c(myvector, GUESS)
if (abs(GUESS-x) < eps) break
i <- i + 1
}
myvector
Why won't the if-statement work?
This should work:
MySqrt <- function (x, eps = 1e-6, itmax = 100, verbose = TRUE){
i <- 1
myvector <- vector(mode='numeric',itmax) ## better to allocate memory
GUESS <- readline(prompt="Enter your guess: ")
GUESS <- as.numeric(GUESS)
myvector[i] <- GUESS
while(i <= itmax){
GUESS <- (GUESS + (x/GUESS)) * 0.5
if (abs(GUESS-myvector[i]) < eps) break
i <- i + 1
myvector[i] <- GUESS
}
myvector[seq(i)]
}
MySqrt(2)
Enter your guess: 1.4
[1] 1.400000 1.414286 1.414214
UPDATE:
Please see #RichieCotton's comment to #agstudy's answer. I agree with Richie, and in fact it makes more sense to use #agstudy's approach.
Original answer:
Your function is fine, your math is off.
GUESS and x should not (necessarilly) be close, but GUESS * GUESS and x should be.
MySqrt <- function (x, eps = 1e-6, itmax = 100, verbose = TRUE){
i <- 1
myvector <- integer(0)
GUESS <- readline(prompt="Enter your guess: ")
GUESS <- as.integer(GUESS)
while(i <= itmax){
GUESS <- (GUESS + (x/GUESS)) * 0.5
myvector <- c(myvector, GUESS)
browser(expr={i == 10 || abs(GUESS-x) < eps})
if (abs((GUESS*GUESS)-x) < eps) break ### <~~~~ SEE HERE
i <- i + 1
}
myvector
}

Resources