How to Code a Recursive Function by Vector in R? - r

I have a function : (1 - (x / d))
which d is members of a vector (V)
Based on length of the vector, a function will be like this:
for example vector is V [2, 3.5, 5, 4.1]
so the function would be:
[(1-(x/2))*(1-(x/3.5))*(1-(x/5))*(1-(x/4.1))]
if I give it an other vector like [1.5, 2] function would be:
[(1-(x/1.5))*(1-(x/2))]
that means the function's shape depends on length of my vector and its elements.
I want a code to create this function and then find its maximum by optimize in R.

Here is a way. Function f returns a function that can be applied to a vector x.
f <- function(d) {
force(d)
function(x) prod(1 - x/d)
}
d <- c(1.5, 2)
g <- f(d)
sapply(1:5, g)
#[1] 0.1666667 0.0000000 0.5000000 1.6666667 3.5000000

Related

Matrix calculations within an R function

I am trying to code a function which will identify which row of an nxm matrix M is closest to a vector y of length m.
What am I doing wrong in my code please? I am aiming for the function to produce a column vector of length n which gives the distance between each row coordinates of the matrix and the vector y. I then want to output the row number of the Matrix for which is the closest point to the vector.
closest.point <- function(M, y) {
p <- length(y)
k <- nrow(M)
T <- matrix(nrow=k)
T <- for(i in 1:n)
for(j in 1:m) {
(X[i,j] - x[j])^2 + (X[i,j] - x[j])^2
}
W <- rowSums(T)
max(W)
df[which.max(W),]
}
Even though there is already a better approach (not using for loops when dealing with matrices) to the problem, I would like to give you a solution to your approach with a for loop.
There were some mistakes in your function. There are some undefined variables like n, m or X.
Also try to avoid to name variables as T, because R interprets T as TRUE. It works but could result in some errors if one uses T as TRUE in the following code lines.
When looping, you need to give an index to your variable that you are updating, like T.matrix[i, j] and not only T.matrix as this will overwrite T.matrix at every iteration.
closest.point <- function(M, y) {
k <- nrow(M)
m <- ncol(M)
T.matrix <- matrix(nrow = k, ncol = m)
for (i in 1:k) {
for (j in 1:m) {
T.matrix[i, j] <- (M[i,j] - y[j])^2 + (M[i,j] - y[j])^2
}
}
W <- rowSums(T.matrix)
return(which.min(W))
}
# example 1
closest.point(M = rbind(c(1, 1, 1),
c(1, 2, 5)),
y = cbind(c(1, 2, 5)))
# [1] 2
# example 2
closest.point(M = rbind(c(1, 1, 1, 1),
c(1, 2, 5, 7)),
y = cbind(c(2, 2, 6, 2)))
# [1] 2
You should try to avoid using for loop to do operations on vectors and matrices. The dist base function calculates distances. Then which.min will give you the index of the minimal distance.
set.seed(0)
M <- matrix(rnorm(100), ncol = 5)
y <- rnorm(5)
closest_point <- function(M, y) {
dist_mat <- as.matrix(dist(rbind(M, y)))
all_distances <- dist_mat[1:nrow(M),ncol(dist_mat)]
which.min(all_distances)
}
closest_point(M, y)
#>
#> 14
Created on 2021-12-10 by the reprex package (v2.0.1)
Hope this makes sense, let me know if you have questions.
There are a number of problems here
p is defined but never used.
Although not wrong T does not really have to be a matrix. It would be sufficient to have it be a vector.
Although not wrong using T as a variable is dangerous because T also means TRUE.
The code defines T and them immediately throws it away in the next statement overwriting it. The prior statement defining T is never used.
for always has the value of NULL so assigning it to T is pointless.
the double for loop doesn't do anything. There are no assignments in it so the loops have no effect.
the loops refer to m, n, X and x but these are nowhere defined.
(X[i,j] - x[j])^2 is repeated. It is only needed once.
Writing max(W) on a line by itself has no effect. It only causes printing to be done if done directly in the console. If done in a function it has no effect. If you meant to print it then write print(max(W)).
We want the closest point, not the farthest point, so max should be min.
df is used in the last line but is not defined anywhere.
The question is incomplete without a test run.
I have tried to make the minimum changes to make this work:
closest.point <- function(M, y) {
nr <- nrow(M)
nc <- ncol(M)
W <- numeric(nr) # vector having nr zeros
for(i in 1:nr) {
for(j in 1:nc) {
W[i] <- W[i] + (M[i,j] - y[j])^2
}
}
print(W)
print(min(W))
M[which.min(W),]
}
set.seed(123)
M <- matrix(rnorm(12), 4); M
## [,1] [,2] [,3]
## [1,] -0.56047565 0.1292877 -0.6868529
## [2,] -0.23017749 1.7150650 -0.4456620
## [3,] 1.55870831 0.4609162 1.2240818
## [4,] 0.07050839 -1.2650612 0.3598138
y <- rnorm(3); y
## [1] 0.4007715 0.1106827 -0.5558411
closest.point(M, y)
## [1] 0.9415062 2.9842785 4.6316069 2.8401691 <--- W
## [1] 0.9415062 <--- min(W)
## [1] -0.5604756 0.1292877 -0.6868529 <-- closest row
That said the calculation of the closest row can be done in this function with a one-line body. We transpose M and then subtract y from it which will subtract y from each column but the columns of the transpose are the rows of M so this subtracts y from each row. Then take the column sums of the squared differences and find which one is least. Subscript M using that.
closest.point2 <- function(M, y) {
M[which.min(colSums((t(M) - y)^2)), ]
}
closest.point2(M, y)
## [1] -0.5604756 0.1292877 -0.6868529 <-- closest row

Pass function name as argument in mapply?

I would like to pass a function name as an argument in mapply:
f2 <- function(a, b) a + b^2
f <- function(a, b, func) func(a, b)
f(1, 3, f2) ## returns 10
mapply(f2, 1:2, 3) ## returns [1] 10 11
mapply(function(a, b) f(a, b, f2), 1:2, 3) ## returns [1] 10 11
mapply(f, 1:2, 3, f2) ## fails
The final mapply call generates the error
Error in dots[[3L]][[1L]] : object of type 'closure' is not subsettable
Is there any way to do this?
mapply assumes you want to iterate over all the vectors you pass after the first function. But you want to use the same value of f2 for every iteration. You can do that useing the MoreArgs= parameter
mapply(f, 1:2, 3, MoreArgs=list(func=f2))
You don't have the same problem with the 3 because R will perform vector recycling to expand 3 to c(3,3) to match the same length as c(1,2). Functions in R don't have the same implicit recycling behaviors. But if you want the value to always stay the same, it's better to put it in the MoreArgs parameter
1) Wrap the function in a list:
mapply(f, 1:2, 3, list(f2))
## [1] 10 11
2) Typically functions that have function arguments use match.fun so that one can pass either the function or a character string containing its name. For example, mapply itself does that so the above line of code could equally be written as: mapply("f", 1:2, 3, list(f2)) . If f were written that way then we could simply specify the name of f2 as a character string, namely "f2" .
f <- function(a, b, func) {
func <- match.fun(func)
func(a, b)
}
mapply(f, 1:2, 3, "f2")
## [1] 10 11

Simplifying matrix product with one unknown variable

I have to compute a product of 3 matrices D=ABC with:
A is a (1x3) matrix,
B is a (3x3) matrix,
C is a (3x1) matrix (and is equal to A', if it matters)
The result of this product is a simple value, and the calculation is very straightforward in R.
My problem is there is one unknown, namely X, inside A and C, and I would like to get the result as a formula: D = ABD = f(X).
Is there any way I could achieve this with R ?
Define D as shown below where argument B is the square matrix and A is a function of x returning a vector.
D <- function(B, A) function(x) t(A(x)) %*% B %*% A(x)
# test
A <- function(x) seq(3) * x
B <- matrix(1:9, 3)
Dfun <- D(B, A)
Dfun(10)
## [1] 22800

Return integrated function using R

Let's say i have a function defined as the following in R:
> f <- function(x) 0.5*sin(x)*(x>=0)*(x<=pi)
i can do this to integrate it between 0 and pi:
> Integrate <- function(f,a,b) integrate(Vectorize(f),a,b)$value
> F <- Integrate(f,0,pi)
But if i want to evaluate and return some values of F i get this error:
> F(c(-100,0,1,2,pi,100))
Error in F(c(-100, 0, 1, 2, pi, 100)) :
function "F" is not found
i can understand that this is due to the fact, that my integrate <- function(f,a,b) returns a constant value C which is the result of the integration of f between a and b, but how can i return F as a function to be able to evaluate it's values as a vector and plot it ?
like in this case F should give 0 for any value less than 0 and 1 for any value bigger than pi and be variable between them.
Thanks.
Edit: just to sum it up more clearly: how can i define a function f(x) in [a,b] that will give me f(x) if x is in [a,b], 0 if xb ?
Try wrapping your function call in an sapply and have Integrate return a function.
Integrate <- function(f, a, b) function(x) if (x < a) 0 else if (x > b) 1 else integrate(Vectorize(f), a, x)$value
F <- Integrate(f, 0, pi)
sapply(c(-100,0,1,2,pi,100), F)
gives
[1] 0.0000000 0.0000000 0.2298488 0.7080734 1.0000000 1.0000000

how to calculate the Euclidean norm of a vector in R?

I tried norm, but I think it gives the wrong result. (the norm of c(1, 2, 3) is sqrt(1*1+2*2+3*3), but it returns 6..
x1 <- 1:3
norm(x1)
# Error in norm(x1) : 'A' must be a numeric matrix
norm(as.matrix(x1))
# [1] 6
as.matrix(x1)
# [,1]
# [1,] 1
# [2,] 2
# [3,] 3
norm(as.matrix(x1))
# [1] 6
Does anyone know what's the function to calculate the norm of a vector in R?
norm(c(1,1), type="2") # 1.414214
norm(c(1, 1, 1), type="2") # 1.732051
This is a trivial function to write yourself:
norm_vec <- function(x) sqrt(sum(x^2))
I was surprised that nobody had tried profiling the results for the above suggested methods, so I did that. I've used a random uniform function to generate a list and used that for repetition (Just a simple back of the envelop type of benchmark):
> uut <- lapply(1:100000, function(x) {runif(1000, min=-10^10, max=10^10)})
> norm_vec <- function(x) sqrt(sum(x^2))
> norm_vec2 <- function(x){sqrt(crossprod(x))}
>
> system.time(lapply(uut, norm_vec))
user system elapsed
0.58 0.00 0.58
> system.time(lapply(uut, norm_vec2))
user system elapsed
0.35 0.00 0.34
> system.time(lapply(uut, norm, type="2"))
user system elapsed
6.75 0.00 6.78
> system.time(lapply(lapply(uut, as.matrix), norm))
user system elapsed
2.70 0.00 2.73
It seems that taking the power and then sqrt manually is faster than the builtin norm for real values vectors at least. This is probably because norm internally does an SVD:
> norm
function (x, type = c("O", "I", "F", "M", "2"))
{
if (identical("2", type)) {
svd(x, nu = 0L, nv = 0L)$d[1L]
}
else .Internal(La_dlange(x, type))
}
and the SVD function internally converts the vector into a matrix, and does more complicated stuff:
> svd
function (x, nu = min(n, p), nv = min(n, p), LINPACK = FALSE)
{
x <- as.matrix(x)
...
EDIT (20 Oct 2019):
There have been some comments to point out the correctness issue which the above test case doesn't bring out:
> norm_vec(c(10^155))
[1] Inf
> norm(c(10^155), type="2")
[1] 1e+155
This happens because large numbers are considered as infinity in R:
> 10^309
[1] Inf
So, it looks like:
It seems that taking the power and then sqrt manually is faster than the builtin norm for real values vectors for small numbers.
How small? So that the sum of squares doesn't overflow.
norm(x, type = c("O", "I", "F", "M", "2"))
The default is "O".
"O", "o" or "1" specifies the one norm, (maximum absolute column sum);
"F" or "f" specifies the Frobenius norm (the Euclidean norm of x treated as if it were a vector);
norm(as.matrix(x1),"o")
The result is 6, same as norm(as.matrix(x1))
norm(as.matrix(x1),"f")
The result is sqrt(1*1+2*2+3*3)
So, norm(as.matrix(x1),"f") is answer.
We can also find the norm as :
Result<-sum(abs(x)^2)^(1/2)
OR Even You can also try as:
Result<-sqrt(t(x)%*%x)
Both will give the same answer
I'mma throw this out there too as an equivalent R expression
norm_vec(x) <- function(x){sqrt(crossprod(x))}
Don't confuse R's crossprod with a similarly named vector/cross product. That naming is known to cause confusion especially for those with a physics/mechanics background.
Answer for Euclidean length of a vector (k-norm) with scaling to avoid destructive underflow and overflow is
norm <- function(x, k) { max(abs(x))*(sum((abs(x)/max(abs(x)))^k))^(1/k) }
See below for explanation.
1. Euclidean length of a vector with no scaling:
norm() is a vector-valued function which computes the length of the vector. It takes two arguments such as the vector x of class matrix and the type of norm k of class integer.
norm <- function(x, k) {
# x = matrix with column vector and with dimensions mx1 or mxn
# k = type of norm with integer from 1 to +Inf
stopifnot(k >= 1) # check for the integer value of k greater than 0
stopifnot(length(k) == 1) # check for length of k to be 1. The variable k is not vectorized.
if(k == Inf) {
# infinity norm
return(apply(x, 2, function(vec) max(abs(vec)) ))
} else {
# k-norm
return(apply(x, 2, function(vec) (sum((abs(vec))^k))^(1/k) ))
}
}
x <- matrix(c(1,-2,3,-4)) # column matrix
sapply(c(1:4, Inf), function(k) norm(x = x, k = k))
# [1] 10.000000 5.477226 4.641589 4.337613 4.000000
1-norm (10.0) converges to infinity-norm (4.0).
k-norm is also called as "Euclidean norm in Euclidean n-dimensional space".
Note:
In the norm() function definition, for vectors with real components, the absolute values can be dropped in norm-2k or even indexed norms, where k >= 1.
If you are confused with the norm function definition, you can read each one individually as given below.
norm_1 <- function(x) sum(abs(x))
norm_2 <- function(x) (sum((abs(x))^2))^(1/2)
norm_3 <- function(x) (sum((abs(x))^3))^(1/3)
norm_4 <- function(x) (sum((abs(x))^4))^(1/4)
norm_k <- function(x) (sum((abs(x))^k))^(1/k)
norm_inf <- max(abs(x))
2. Euclidean length of a vector with scaling to avoid destructive overflow and underflow issues:
Note-2:
The only problem with this solution norm() is that it does not guard against overflow or underflow problems as alluded here and here.
Fortunately, someone had already solved this problem for 2-norm (euclidean length) in the blas (basic linear algebra subroutines) fortran library. A description of this problem can be found in the textbook of "Numerical Methods and Software by Kahaner, Moler and Nash" - Chapter-1, Section 1.3, page - 7-9.
The name of the fortran subroutine is dnrm2.f, which handles destructive overflow and underflow issues in the norm() by scaling with the maximum of the vector components. The destructive overflow and underflow problem arise due to radical operation in the norm() function.
I will show how to implement dnrm2.f in R below.
#1. find the maximum among components of vector-x
max_x <- max(x)
#2. scale or divide the components of vector by max_x
scaled_x <- x/max_x
#3. take square of the scaled vector-x
sq_scaled_x <- (scaled_x)^2
#4. sum the square of scaled vector-x
sum_sq_scaled_x <- sum(sq_scaled_x)
#5. take square root of sum_sq_scaled_x
rt_sum_sq_scaled_x <- sqrt(sum_sq_scaled_x)
#6. multiply the maximum of vector x with rt_sum_sq_scaled_x
max_x*rt_sum_sq_scaled_x
one-liner of the above 6-steps of dnrm2.f in R is:
# Euclidean length of vector - 2norm
max(x)*sqrt(sum((x/max(x))^2))
Lets try example vectors to compute 2-norm (see other solutions in this thread) for this problem.
x = c(-8e+299, -6e+299, 5e+299, -8e+298, -5e+299)
max(x)*sqrt(sum((x/max(x))^2))
# [1] 1.227355e+300
x <- (c(1,-2,3,-4))
max(x)*sqrt(sum((x/max(x))^2))
# [1] 5.477226
Therefore, the recommended way to implement a generalized solution for k-norm in R is that single line, which guard against the destructive overflow or underflow problems. To improve this one-liner, you can use a combination of norm() without scaling for a vector containing not-too-small or not-too-large components and knorm() with scaling for a vector with too-small or too-large components. Implementing scaling for all vectors results in too many calculations. I did not implement this improvement in knorm() given below.
# one-liner for k-norm - generalized form for all norms including infinity-norm:
max(abs(x))*(sum((abs(x)/max(abs(x)))^k))^(1/k)
# knorm() function using the above one-liner.
knorm <- function(x, k) {
# x = matrix with column vector and with dimensions mx1 or mxn
# k = type of norm with integer from 1 to +Inf
stopifnot(k >= 1) # check for the integer value of k greater than 0
stopifnot(length(k) == 1) # check for length of k to be 1. The variable k is not vectorized.
# covert elements of matrix to its absolute values
x <- abs(x)
if(k == Inf) { # infinity-norm
return(apply(x, 2, function(vec) max(vec)))
} else { # k-norm
return(apply(x, 2, function(vec) {
max_vec <- max(vec)
return(max_vec*(sum((vec/max_vec)^k))^(1/k))
}))
}
}
# 2-norm
x <- matrix(c(-8e+299, -6e+299, 5e+299, -8e+298, -5e+299))
sapply(2, function(k) knorm(x = x, k = k))
# [1] 1.227355e+300
# 1-norm, 2-norm, 3-norm, 4-norm, and infinity-norm
sapply(c(1:4, Inf), function(k) knorm(x = x, k = k))
# [1] 2.480000e+300 1.227355e+300 9.927854e+299 9.027789e+299 8.000000e+299
x <- matrix(c(1,-2,3,-4))
sapply(c(1:4, Inf), function(k) knorm(x = x, k = k))
# [1] 10.000000 5.477226 4.641589 4.337613 4.000000
x <- matrix(c(1,-2,3,-4, 0, -8e+299, -6e+299, 5e+299, -8e+298, -5e+299), nc = 2)
sapply(c(1:4, Inf), function(k) knorm(x = x, k = k))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1.00e+01 5.477226e+00 4.641589e+00 4.337613e+00 4e+00
# [2,] 2.48e+300 1.227355e+300 9.927854e+299 9.027789e+299 8e+299
If you have a data.frame or a data.table 'DT', and want to compute the Euclidian norm (norm 2) across each row, the apply function can be used.
apply(X = DT, MARGIN = 1, FUN = norm, '2')
Example:
>DT
accx accy accz
1: 9.576807 -0.1629486 -0.2587167
2: 9.576807 -0.1722938 -0.2681506
3: 9.576807 -0.1634264 -0.2681506
4: 9.576807 -0.1545590 -0.2681506
5: 9.576807 -0.1621254 -0.2681506
6: 9.576807 -0.1723825 -0.2682434
7: 9.576807 -0.1723825 -0.2728810
8: 9.576807 -0.1723825 -0.2775187
> apply(X = DT, MARGIN = 1, FUN = norm, '2')
[1] 9.581687 9.582109 9.581954 9.581807 9.581932 9.582114 9.582245 9.582378
Following AbdealiJK's answer,
I experimented further to gain some insight.
Here's one.
x = c(-8e+299, -6e+299, 5e+299, -8e+298, -5e+299)
sqrt(sum(x^2))
norm(x, type='2')
The first result is Inf and the second one is 1.227355e+300 which is quite correct as I show you in the code below.
library(Rmpfr)
y <- mpfr(x, 120)
sqrt(sum(y*y))
The result is 1227354879.... I didn't count the number of trailing numbers but it looks all right. I know there another way around this OVERFLOW problem which is first applying log function to all numbers and summing up, which I do not have time to implement!
Create your matrix as column vise using cbind then the norm function works well with Frobenius norm (the Euclidean norm) as an argument.
x1<-cbind(1:3)
norm(x1,"f")
[1] 3.741657
sqrt(1*1+2*2+3*3)
[1] 3.741657

Resources