I have a table in R where the rownames are (as per default) A,B,C,... and the column names are 1,2,3,4,... as assigned. For example, the output of
x <- as.table(matrix(c(2,20,3,4,2,5,8,1,3),nrow=3,ncol=3,byrow=TRUE))
colnames(x) <- seq(1,ncol(x))
x
Now I'd like to permute the table by a matrix P, which I've already found, containing only 1s and 0s (the point being that A %*% P will permute the columns of A so as to maximize its diagonal). In this case, for x you can get P as
P <- matrix(c(0,0,1,1,0,0,0,1,0),nrow=3,byrow=T)
Note P will have only one '1' per row and column, the rest '0'. My issue is: if you do something like
Y <- x %*% P
you will see that, while the diagonals are rightly arranged, the column names from x have been replaced by matrix column names from P ([,1] [,2] [,3] in this case).
How can I perform the permutation (x %*% P) while retaining the column names from x in the correct order? That is to say, the column name follows the column when the column moves. So in this case, the column names would be 2 3 1.
You'll have to permute the column names of x in the same way. For example:
colnames(Y) <- 1:3 %*% P
Y
which prints
2 3 1
A 20 3 2
B 2 5 4
C 1 3 8
This was extra simple because the original column names were integers 1:3. In general, you'll need something like
colnames(Y) <- colnames(x)[1:3 %*% P]
To check, permute columns of Y:
Z <- Y %*% P
colnames(Z) <- colnames(Y)[1:3 %*% P]
Z
which prints
3 1 2
A 3 2 20
B 5 4 2
C 3 8 1
Edited to add: As came out in the comments, if P is computed numerically, it might not contain exact 0 and 1 values, so you should use
colnames(Y) <- colnames(x)[1:3 %*% round(P)]
to avoid rounding error.
If the intention is to retain the column names, use [], after creating a copy of 'x' to 'Y'
Y <- x
Y[] <- x %*% P
When comparing two vectors it is simple to calculate the angle between them, but in R it is noticeably harder to calculate the angle between a vector and a matrix of vectors efficiently.
Say you have a 2D vector A=(2, 0) and then a matrix B={(1,3), (-2,4), (-3,-3), (1,-4)}. I am interested in working out the smallest angle between A and the vectors in B.
If I try to use
min(acos( sum(a%*%b) / ( sqrt(sum(a %*% a)) * sqrt(sum(b %*% b)) ) ))
it fails as they are non-conformable arguments.
Is there any code similar to that of above which can handle a vector and matrix?
Note: At the risk of being marked as a duplicate the solutions found in several sources do not apply in this case
Edit: The reason for this is I have a large matrix X, and A is just one row of this. I am reducing the number of elements based solely on the angle of each vector. The first element of B is the first in X, and then if the angle between any element in B and the next element X[,2] (here A) is greater than a certain tolerance, this is added to the list B. I am just using B<-rbind(B,X[,2]) to do this, so this results in B being a matrix.
You don't describe the format of A and B in detail, so I assume they are matrices by rows.
(A <- c(2, 0))
# [1] 2 0
(B <- rbind(c(1,3), c(-2,4), c(-3,-3), c(1,-4)))
# [,1] [,2]
# [1,] 1 3
# [2,] -2 4
# [3,] -3 -3
# [4,] 1 -4
Solution 1 with apply():
apply(B, 1, FUN = function(x){
acos(sum(x*A) / (sqrt(sum(x*x)) * sqrt(sum(A*A))))
})
# [1] 1.249046 2.034444 2.356194 1.325818
Solution 2 with sweep(): (replace sum() above with rowSums())
sweep(B, 2, A, FUN = function(x, y){
acos(rowSums(x*y) / (sqrt(rowSums(x*x)) * sqrt(rowSums(y*y))))
})
# [1] 1.249046 2.034444 2.356194 1.325818
Solution 3 with split() and mapply:
mapply(function(x, y){
acos(sum(x*y) / (sqrt(sum(x*x)) * sqrt(sum(y*y))))
}, split(B, row(B)), list(A))
# 1 2 3 4
# 1.249046 2.034444 2.356194 1.325818
The vector of dot products between the rows of B and the vector A is B %*% A. The vector lengths of the rows of B are sqrt(rowSums(B^2)).
To find the smallest angle, you want the largest cosine, but you don't actually need to compute the angle, so the length of A doesn't matter.
Thus the row with the smallest angle will be given by row <- which.max((B %*% A)/sqrt(rowSums(B^2))). With Darren's data, that's row 1.
If you really do need the smallest angle, then you can apply the formula for two vectors to B[row,] and A. If you need all of the angles, then the formula would be
acos((B %*% A)/sqrt(rowSums(B^2))/sqrt(sum(A^2)))
I have to compute a product of 3 matrices D=ABC with:
A is a (1x3) matrix,
B is a (3x3) matrix,
C is a (3x1) matrix (and is equal to A', if it matters)
The result of this product is a simple value, and the calculation is very straightforward in R.
My problem is there is one unknown, namely X, inside A and C, and I would like to get the result as a formula: D = ABD = f(X).
Is there any way I could achieve this with R ?
Define D as shown below where argument B is the square matrix and A is a function of x returning a vector.
D <- function(B, A) function(x) t(A(x)) %*% B %*% A(x)
# test
A <- function(x) seq(3) * x
B <- matrix(1:9, 3)
Dfun <- D(B, A)
Dfun(10)
## [1] 22800
I would like to generate numbers from a triangular distribution with three parameters: a, b, c where c in my case is defined as (a+b)/2.
Let's say I have a vector x:
x <- c(1,-1,2,-2,3,-3,4,-4,5,-5,11,-11,12,-12,13,-13)
And I want to generate as many new values as there are negative numbers in vector x. So further I can replace negative values with numbers generated from triangular distribution.
library(triangle)
c = abs(x[x<0])/2
sample <- rtriangle(length(a[which(a<0)]), 0, abs(x[x<0]),c)
Obviously this does not work, as I get a warning message:
Warning messages:
1: In if (a > c | b < c) return(rep(NaN, times = n)) :
the condition has length > 1 and only the first element will be used
2: In if (a != c) { :
the condition has length > 1 and only the first element will be used
3: In p[i] * (b - a) :
longer object length is not a multiple of shorter object length
4: In p[i] <- a + sqrt(p[i] * (b - a) * (c - a)) :
number of items to replace is not a multiple of replacement length
5: In (1 - p[j]) * (b - a) :
longer object length is not a multiple of shorter object length
6: In p[j] <- b - sqrt((1 - p[j]) * (b - a) * (b - c)) :
number of items to replace is not a multiple of replacement length
Since rtriangle does not take vectors as input, you could create a vector evaluating every element of a vector using sapply like this:
x <- c(1,-1,2,-2,3,-3,4,-4,5,-5,11,-11,12,-12,13,-13)
library("triangle")
sample = sapply(abs(x[x<0]), function(x){ rtriangle(1,0,x,x/2) })
> sample
[1] 0.6514940 0.6366981 1.8598445 0.9866790 1.7517438 2.9444719 4.1537113 2.2315813
You will get one random sample for 8 different triangular distributions.
I tried norm, but I think it gives the wrong result. (the norm of c(1, 2, 3) is sqrt(1*1+2*2+3*3), but it returns 6..
x1 <- 1:3
norm(x1)
# Error in norm(x1) : 'A' must be a numeric matrix
norm(as.matrix(x1))
# [1] 6
as.matrix(x1)
# [,1]
# [1,] 1
# [2,] 2
# [3,] 3
norm(as.matrix(x1))
# [1] 6
Does anyone know what's the function to calculate the norm of a vector in R?
norm(c(1,1), type="2") # 1.414214
norm(c(1, 1, 1), type="2") # 1.732051
This is a trivial function to write yourself:
norm_vec <- function(x) sqrt(sum(x^2))
I was surprised that nobody had tried profiling the results for the above suggested methods, so I did that. I've used a random uniform function to generate a list and used that for repetition (Just a simple back of the envelop type of benchmark):
> uut <- lapply(1:100000, function(x) {runif(1000, min=-10^10, max=10^10)})
> norm_vec <- function(x) sqrt(sum(x^2))
> norm_vec2 <- function(x){sqrt(crossprod(x))}
>
> system.time(lapply(uut, norm_vec))
user system elapsed
0.58 0.00 0.58
> system.time(lapply(uut, norm_vec2))
user system elapsed
0.35 0.00 0.34
> system.time(lapply(uut, norm, type="2"))
user system elapsed
6.75 0.00 6.78
> system.time(lapply(lapply(uut, as.matrix), norm))
user system elapsed
2.70 0.00 2.73
It seems that taking the power and then sqrt manually is faster than the builtin norm for real values vectors at least. This is probably because norm internally does an SVD:
> norm
function (x, type = c("O", "I", "F", "M", "2"))
{
if (identical("2", type)) {
svd(x, nu = 0L, nv = 0L)$d[1L]
}
else .Internal(La_dlange(x, type))
}
and the SVD function internally converts the vector into a matrix, and does more complicated stuff:
> svd
function (x, nu = min(n, p), nv = min(n, p), LINPACK = FALSE)
{
x <- as.matrix(x)
...
EDIT (20 Oct 2019):
There have been some comments to point out the correctness issue which the above test case doesn't bring out:
> norm_vec(c(10^155))
[1] Inf
> norm(c(10^155), type="2")
[1] 1e+155
This happens because large numbers are considered as infinity in R:
> 10^309
[1] Inf
So, it looks like:
It seems that taking the power and then sqrt manually is faster than the builtin norm for real values vectors for small numbers.
How small? So that the sum of squares doesn't overflow.
norm(x, type = c("O", "I", "F", "M", "2"))
The default is "O".
"O", "o" or "1" specifies the one norm, (maximum absolute column sum);
"F" or "f" specifies the Frobenius norm (the Euclidean norm of x treated as if it were a vector);
norm(as.matrix(x1),"o")
The result is 6, same as norm(as.matrix(x1))
norm(as.matrix(x1),"f")
The result is sqrt(1*1+2*2+3*3)
So, norm(as.matrix(x1),"f") is answer.
We can also find the norm as :
Result<-sum(abs(x)^2)^(1/2)
OR Even You can also try as:
Result<-sqrt(t(x)%*%x)
Both will give the same answer
I'mma throw this out there too as an equivalent R expression
norm_vec(x) <- function(x){sqrt(crossprod(x))}
Don't confuse R's crossprod with a similarly named vector/cross product. That naming is known to cause confusion especially for those with a physics/mechanics background.
Answer for Euclidean length of a vector (k-norm) with scaling to avoid destructive underflow and overflow is
norm <- function(x, k) { max(abs(x))*(sum((abs(x)/max(abs(x)))^k))^(1/k) }
See below for explanation.
1. Euclidean length of a vector with no scaling:
norm() is a vector-valued function which computes the length of the vector. It takes two arguments such as the vector x of class matrix and the type of norm k of class integer.
norm <- function(x, k) {
# x = matrix with column vector and with dimensions mx1 or mxn
# k = type of norm with integer from 1 to +Inf
stopifnot(k >= 1) # check for the integer value of k greater than 0
stopifnot(length(k) == 1) # check for length of k to be 1. The variable k is not vectorized.
if(k == Inf) {
# infinity norm
return(apply(x, 2, function(vec) max(abs(vec)) ))
} else {
# k-norm
return(apply(x, 2, function(vec) (sum((abs(vec))^k))^(1/k) ))
}
}
x <- matrix(c(1,-2,3,-4)) # column matrix
sapply(c(1:4, Inf), function(k) norm(x = x, k = k))
# [1] 10.000000 5.477226 4.641589 4.337613 4.000000
1-norm (10.0) converges to infinity-norm (4.0).
k-norm is also called as "Euclidean norm in Euclidean n-dimensional space".
Note:
In the norm() function definition, for vectors with real components, the absolute values can be dropped in norm-2k or even indexed norms, where k >= 1.
If you are confused with the norm function definition, you can read each one individually as given below.
norm_1 <- function(x) sum(abs(x))
norm_2 <- function(x) (sum((abs(x))^2))^(1/2)
norm_3 <- function(x) (sum((abs(x))^3))^(1/3)
norm_4 <- function(x) (sum((abs(x))^4))^(1/4)
norm_k <- function(x) (sum((abs(x))^k))^(1/k)
norm_inf <- max(abs(x))
2. Euclidean length of a vector with scaling to avoid destructive overflow and underflow issues:
Note-2:
The only problem with this solution norm() is that it does not guard against overflow or underflow problems as alluded here and here.
Fortunately, someone had already solved this problem for 2-norm (euclidean length) in the blas (basic linear algebra subroutines) fortran library. A description of this problem can be found in the textbook of "Numerical Methods and Software by Kahaner, Moler and Nash" - Chapter-1, Section 1.3, page - 7-9.
The name of the fortran subroutine is dnrm2.f, which handles destructive overflow and underflow issues in the norm() by scaling with the maximum of the vector components. The destructive overflow and underflow problem arise due to radical operation in the norm() function.
I will show how to implement dnrm2.f in R below.
#1. find the maximum among components of vector-x
max_x <- max(x)
#2. scale or divide the components of vector by max_x
scaled_x <- x/max_x
#3. take square of the scaled vector-x
sq_scaled_x <- (scaled_x)^2
#4. sum the square of scaled vector-x
sum_sq_scaled_x <- sum(sq_scaled_x)
#5. take square root of sum_sq_scaled_x
rt_sum_sq_scaled_x <- sqrt(sum_sq_scaled_x)
#6. multiply the maximum of vector x with rt_sum_sq_scaled_x
max_x*rt_sum_sq_scaled_x
one-liner of the above 6-steps of dnrm2.f in R is:
# Euclidean length of vector - 2norm
max(x)*sqrt(sum((x/max(x))^2))
Lets try example vectors to compute 2-norm (see other solutions in this thread) for this problem.
x = c(-8e+299, -6e+299, 5e+299, -8e+298, -5e+299)
max(x)*sqrt(sum((x/max(x))^2))
# [1] 1.227355e+300
x <- (c(1,-2,3,-4))
max(x)*sqrt(sum((x/max(x))^2))
# [1] 5.477226
Therefore, the recommended way to implement a generalized solution for k-norm in R is that single line, which guard against the destructive overflow or underflow problems. To improve this one-liner, you can use a combination of norm() without scaling for a vector containing not-too-small or not-too-large components and knorm() with scaling for a vector with too-small or too-large components. Implementing scaling for all vectors results in too many calculations. I did not implement this improvement in knorm() given below.
# one-liner for k-norm - generalized form for all norms including infinity-norm:
max(abs(x))*(sum((abs(x)/max(abs(x)))^k))^(1/k)
# knorm() function using the above one-liner.
knorm <- function(x, k) {
# x = matrix with column vector and with dimensions mx1 or mxn
# k = type of norm with integer from 1 to +Inf
stopifnot(k >= 1) # check for the integer value of k greater than 0
stopifnot(length(k) == 1) # check for length of k to be 1. The variable k is not vectorized.
# covert elements of matrix to its absolute values
x <- abs(x)
if(k == Inf) { # infinity-norm
return(apply(x, 2, function(vec) max(vec)))
} else { # k-norm
return(apply(x, 2, function(vec) {
max_vec <- max(vec)
return(max_vec*(sum((vec/max_vec)^k))^(1/k))
}))
}
}
# 2-norm
x <- matrix(c(-8e+299, -6e+299, 5e+299, -8e+298, -5e+299))
sapply(2, function(k) knorm(x = x, k = k))
# [1] 1.227355e+300
# 1-norm, 2-norm, 3-norm, 4-norm, and infinity-norm
sapply(c(1:4, Inf), function(k) knorm(x = x, k = k))
# [1] 2.480000e+300 1.227355e+300 9.927854e+299 9.027789e+299 8.000000e+299
x <- matrix(c(1,-2,3,-4))
sapply(c(1:4, Inf), function(k) knorm(x = x, k = k))
# [1] 10.000000 5.477226 4.641589 4.337613 4.000000
x <- matrix(c(1,-2,3,-4, 0, -8e+299, -6e+299, 5e+299, -8e+298, -5e+299), nc = 2)
sapply(c(1:4, Inf), function(k) knorm(x = x, k = k))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1.00e+01 5.477226e+00 4.641589e+00 4.337613e+00 4e+00
# [2,] 2.48e+300 1.227355e+300 9.927854e+299 9.027789e+299 8e+299
If you have a data.frame or a data.table 'DT', and want to compute the Euclidian norm (norm 2) across each row, the apply function can be used.
apply(X = DT, MARGIN = 1, FUN = norm, '2')
Example:
>DT
accx accy accz
1: 9.576807 -0.1629486 -0.2587167
2: 9.576807 -0.1722938 -0.2681506
3: 9.576807 -0.1634264 -0.2681506
4: 9.576807 -0.1545590 -0.2681506
5: 9.576807 -0.1621254 -0.2681506
6: 9.576807 -0.1723825 -0.2682434
7: 9.576807 -0.1723825 -0.2728810
8: 9.576807 -0.1723825 -0.2775187
> apply(X = DT, MARGIN = 1, FUN = norm, '2')
[1] 9.581687 9.582109 9.581954 9.581807 9.581932 9.582114 9.582245 9.582378
Following AbdealiJK's answer,
I experimented further to gain some insight.
Here's one.
x = c(-8e+299, -6e+299, 5e+299, -8e+298, -5e+299)
sqrt(sum(x^2))
norm(x, type='2')
The first result is Inf and the second one is 1.227355e+300 which is quite correct as I show you in the code below.
library(Rmpfr)
y <- mpfr(x, 120)
sqrt(sum(y*y))
The result is 1227354879.... I didn't count the number of trailing numbers but it looks all right. I know there another way around this OVERFLOW problem which is first applying log function to all numbers and summing up, which I do not have time to implement!
Create your matrix as column vise using cbind then the norm function works well with Frobenius norm (the Euclidean norm) as an argument.
x1<-cbind(1:3)
norm(x1,"f")
[1] 3.741657
sqrt(1*1+2*2+3*3)
[1] 3.741657