I guess, I have a two leveled question referring to diag in R and matlab.
1) I was wondering if there was a way already developed to access different diagonals of matrices in R similar to the way it is done in Matlab (see http://www.mathworks.com/help/techdoc/ref/diag.html).
2) If there is not already a current function how can my code be improved such that it is similar to the R diag where
diag(x = 1, nrow, ncol) # returns the values of the diagonal
diag(x) <- value # inserts values on the diagonal
Presently my code returns the elements on the diagonal given k but how can it be written so that if it is specified like the second way (above) that it allows me to insert the values on the diagonal. Presently to do this, I use diag.ind to give me the indices and then using those indices to insert the values on the k diagonal.
Here is the code:
'diag.ind'<-function(x,k=0){
if(k=='') k=0
x<-as.matrix(x)
if(dim(x)[2]==dim(x)[1]){
stp_pt_r<-dim(x)[1]
stp_pt_c<-dim(x)[2]
}
if(ncol(x)> dim(x)[1]){
stp_pt_r<-dim(x)[1]
stp_pt_c<-stp_pt_r + 1
}
if(ncol(x)< dim(x)[1]){
stp_pt_c<-dim(x)[2]
stp_pt_r<-stp_pt_c+1
}
if(k==0){
r<-as.matrix(seq(1,stp_pt_r,by=1))
c<-as.matrix(seq(1,stp_pt_c,by=1))
ind.r<- cbind(r,c)
}
if(k>0){
r<-t(as.matrix(seq(1,stp_pt_r,by=1)))
c<-t(as.matrix(seq((1+k),stp_pt_c,by=1)))
ind<-t(rbind.fill.matrix(r,c))
ind.r<-ind[!is.na(ind[,2]),]
}
if(k<0){
k<-abs(k)
r<-t(as.matrix(seq((1+k),stp_pt_r,by=1)))
c<-t(as.matrix(seq(1,stp_pt_c,by=1)))
ind<-t(rbind.fill.matrix(r,c))
ind.r<-ind[!is.na(ind[,1]),]
}
diag.x<-x[ind.r]
output<-list(diag.x=diag.x, diag.ind=ind.r)
return(output)
}
This is kind of clunky and I feel like I must be reinventing the wheel. Thanks in advance for any insight!
After your reply to Andrie this may satisfy:
exdiag <- function(mat, off) {mat[row(mat)+off == col(mat)]}
x <- matrix(1:16, ncol=4)
exdiag(x,1)
#[1] 5 10 15
I was thinking you wanted a function that can assign or return one of a diagonal or a sub- or super- diagonal matrix, This is the constructor function:
subdiag <- function(vec, size, offset=0){
M <- matrix(0, size, size)
M[row(M)-offset == col(M)] <- vec
return(M)}
> subdiag(1, 5, 1)
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 0
[2,] 1 0 0 0 0
[3,] 0 1 0 0 0
[4,] 0 0 1 0 0
[5,] 0 0 0 1 0
Called with only two arguments you would get a diagonal matrix. You can construct super-diagonal matrices with negative offsets. If this is what you wanted for the constructor, then it should not be too hard to construct a similar subdiag<- function to go along with it.
In MATLAB, to assign the values x to the diagonal of A:
n = size(A,1);
A(1:n+1:end) = x
Look up linear indexing.
Although, that might not be what you asked.
Related
Good afternoon ,
I have developped this R function that hashes data in buckets :
# The used packages
library("pacman")
pacman::p_load(dplyr, tidyr, devtools, MASS, pracma, mvtnorm, interval, intervals)
pacman::p_load(sprof, RDocumentation, helpRFunctions, foreach , philentropy , Rcpp , RcppAlgos)
hash<-function(v,p){
if(dot(v,p)>0) return(1) else (0) }
LSH_Band<-function(data,K ){
# We retrieve numerical columns of data
t<-list.df.var.types(data)
df.r<-as.matrix(data[c(t$numeric,t$Intervals)])
n=nrow(df.r)
# we create K*K matrice using normal law
rn=array(rnorm(K*K,0,1),c(K,K))
# we create K*K matrice of integers using uniform law , integrs are unique in each column
rd=unique.array(array(unique(ceiling(runif(K*K,0,ncol(df.r)))),c(K,K)))
buckets<-array(NA,c(K,n))
for (i in 1:K) {
for (j in 1:n) {
buckets[i,j]<-hash(df.r[j,][rd[,i]],rn[,i])
}
}
return(buckets)
}
> df.r
age height salaire.1 salaire.2
1 27 180 0 5000
2 26 178 0 5000
3 30 190 7000 10000
4 31 185 7000 10000
5 31 187 7000 10000
6 38 160 10000 15000
7 39 158 10000 15000
> LSH_Band(df.r, 3 )
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 1 1 1 1 1 1
[2,] 1 1 0 0 0 0 0
[3,] 0 0 0 0 0 0 0
The dot function is the scalar product of two vectors.
My Lsh function takes a row of my data , then it takes a part of the
obtained row using df.r[j,][rd[,i]] . df.r[j,] is j-éme row of the
data.
rd[,i] : rd is a K*K matrix of integers between 1 and ncol(df.r) , each column of the matrix contains only unique integers.
rn[,i] : rn is a K*K matrix that contains values of N(0,1) law.
In the resulting table , observations are represented in columns . I will have k Rows. For the last row , i will compute the scalar product between df.r[j,][rd[,K]] and rn[,K]. I will obtain 1 if the scalar product is positive. rd[,K] and rn[,K] will be used only for the last row in the resulting table and for all observations in that row.
My question :
Is it to replace the loops with variables i and j by a lapply function ?
My real data will be large , this is why i'm asking this question.
Thank you !
The following is a bit too long as a comment, so here are some pointers/issues/remarks:
First off, I have to say I struggle to understand what LHS_Band does. Perhaps some context would help here.
I don't understand the purpose of certain functions like helpRFunctions::list.df.var.type which simply seems to return the column names of data in a list. Note also that t$Intervals returns NULL based on the sample data you give. So I'm not sure what's going on there.
I don't see the point of function pracma::dot either. The dot product between two vectors can be calculated in base R using %*%. There's really no need for an additional package.
Function hash can be written more compactly as
hash <- function(v, p) +(as.numeric(v %*% p) > 0)
This avoids the if conditional which is slow.
Notwithstanding my lack of understanding what it is you're trying to do, here are some tweaks to your code
hash <- function(v, p) +(as.numeric(v %*% p) > 0)
LSH_Band <- function(data, K, seed = NULL) {
# We retrieve numerical columns of data
data <- as.matrix(data[sapply(data, is.numeric)])
# we create K*K matrice using normal law
if (!is.null(seed)) set.seed(seed)
rn <- matrix(rnorm(K * K, 0, 1), nrow = K, ncol = K)
# we create K*K matrice of integers using uniform law , integrs are unique in each column
rd <- sapply(seq_len(K), function(col) sample.int(ncol(data), K))
buckets <- matrix(NA, nrow = K, ncol = nrow(data))
for (i in 1:K) {
buckets[i, ] <- apply(data, 1, function(row) hash(row[rd[, i]], rn[, i]))
}
buckets
}
Always add an option to use a reproducible seed when working with random numbers. That will make debugging a lot easier.
You can replace at least one for loop with apply (which when using MARGIN = 1 iterates through the rows of a matrix (or array)).
I've removed all the unnecessary package dependencies, and replaced the functionality with base R functions.
I have the following problem:
I do have a lists with matrices with indices.
Every column of a matrix shows which row indices should be equal to 1 for that specific column.
All the other values should be equal to 0.
I do know the size of the output matrices and there are no duplicated values in a column.
For example the following matrix should be translated as follows:
m_in = matrix(c(1,3,5,7,3,4), nrow =2)
m_out = matrix(c(1,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,1,1,0,0,0), nrow = 7)
I did made a code that works, but it would be great if I could achieve this without loops in a more efficient/clever way.
Index <- matrix(20, 100, data = sample(1:200))
Vector <- c(2,3,5,8,20)
ListIndices <- sapply(Vector, function(x)Index[0:x,])
emptylistlist <- list()
for (i in 1: length(ListIndices)){
for (j in 1 : 100){
emptylistlist[[i]] <- matrix(nrow = 200, ncol = 100, data = 0)
emptylistlist[[i]][ListIndices[[i]],j]<-1
}
}
We can try sparseMatrix from library(Matrix) and then wrap it with as.matrix.
library(Matrix)
as.matrix(sparseMatrix(i= c(m1), j= c(col(m1)), x=1))
# [,1] [,2] [,3]
#[1,] 1 0 0
#[2,] 0 0 0
#[3,] 1 0 1
#[4,] 0 0 1
#[5,] 0 1 0
#[6,] 0 0 0
#[7,] 0 1 0
If there is a list of matrices, then we can use lapply
lapply(lst, function(y) as.matrix(sparseMatrix(i= c(y), j= c(col(y)), x= 1)))
The typical way is with matrix assignment:
m_out = matrix(0L, max(m_in), ncol(m_in))
m_out[cbind(c(m_in), c(col(m_in)))] <- 1L
How it works: The syntax for matrix assignment M[IND] <- V is described at help("[<-").
Each row of IND is a pair of (row, column) positions in M.
Elements of M at those positions will be overwritten with (corresponding elements of) V.
As far as the list of matrices goes, an array would be more natural:
set.seed(1)
Index <- matrix(20, 100, data = sample(1:200))
Vector <- c(2,3,5,8,20)
idx <- sapply(Vector, function(x)Index[0:x,])
# "ListIndices" is too long a name
a_out = array(0L, dim=c(
max(unlist(idx)),
max(sapply(idx,ncol)),
length(idx)))
a_out[ cbind(
unlist(idx),
unlist(lapply(idx,col)),
rep(seq_along(idx),lengths(idx))
)] <- 1L
The syntax is the same as for matrix assignment.
Seeing as the OP has so many zeros and so few ones, a sparse matrix, as in #akrun's answer makes the most sense, or a sparse array, if such a thing has been implemented.
I suppose this is trivial, but I can't find how to declare a vector of zeros in R.
For example, in Matlab, I would write:
X = zeros(1,3);
You have several options
integer(3)
numeric(3)
rep(0, 3)
rep(0L, 3)
You can also use the matrix command, to create a matrix with n lines and m columns, filled with zeros.
matrix(0, n, m)
replicate is another option:
replicate(10, 0)
# [1] 0 0 0 0 0 0 0 0 0 0
replicate(5, 1)
# [1] 1 1 1 1 1
To create a matrix:
replicate( 5, numeric(3) )
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0 0 0 0 0
#[2,] 0 0 0 0 0
#[3,] 0 0 0 0 0
X <- c(1:3)*0
Maybe this is not the most efficient way to initialize a vector to zero, but this requires to remember only the c() function, which is very frequently cited in tutorials as a usual way to declare a vector.
As as side-note: To someone learning her way into R from other languages, the multitude of functions to do same thing in R may be mindblowing, just as demonstrated by the previous answers here.
Here are four ways to create a one-dimensional vector with zeros - then check if they are identical:
numeric(2) -> a; double(2) -> b; vector("double", 2) -> c; vector("numeric", 2) -> d
identical(a, b, c, d)
In the iteration chapter in R for Data Science they use the "d" option to create this type of vector.
If I have a matrix:
x <- matrix(c(0), ncol=2, nrow=2)
x
[,1] [,2]
[1,] 0 0
[2,] 0 0
and I want to change x[,2][2] to a 1 instead of a 0 and save that in a new matrix y so that the output would be:
y
[,1] [,2]
[1,] 0 0
[2,] 0 1
how do I do this in R?
If you don't want to modify the original matrix, you could reverse the order of the operations that you want to perform, meaning that you first store a copy of the matrix x in a new variable y and then manipulate the entries of the matrix y.
y <- x
y[2,2] <- 1
Else, if you really want to change x and afterwards store a copy of the modified matrix in y... well, I guess that the changes are rather obvious: that would be x[2,2] <-1 followed by y <- x.
This is a two-part problem: the first is to create an NXN square matrix for which only one random element in each row is 1, the other items must be zero. (i.e. the sum of elements in each row is 1).
The second is to create an NXN square matrix for which the sum of items in each row is 1, but each element follows a distribution e.g. normal distribution.
Related questions include (Create a matrix with conditional sum in each row -R)
Matlab seems to do what I want automatically (Why this thing happens with random matrix such that all rows sum up to 1?), but I am looking for a solution in r.
Here is what I tried:
# PART 1
N <- 50
x <- matrix(0,N,N)
lapply(1:N, function(y){
x[y,sample(N,1)]<- 1
})
(I get zeroes still)
# PART 2
N <- 50
x <- matrix(0,N,N)
lapply(1:N, function(y){
x[y,]<- rnorm(N)
})
(It needs scaling)
Here's another loop-less solution that uses the two column addressing facility using the "[<-" function. This creates a two-column index matrix whose first column is simply an ascending series that assigns the row locations, and whose second column (the one responsible for picking the column positions) is a random integer value. (It's a vectorized version of Matthew's "easiest method", and I suspect would be faster since there is only one call to sample.):
M <- matrix(0,N,N)
M[ cbind(1:N, sample(1:N, N, rep=TRUE))] <- 1
> rowSums(M)
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
If you didn't specify rep=TRUE, then colSums(M) would have all been ones as well, but that was not what you requested. It does mean the rank of your resultant matrix may be less than N. If you left out the rep=TRUE the matrix would be full rank.
Here you see why lapply doesn't always replace a loop. You're trying to iterate through the rows of x and modify the matrix, but what you're modifying is a copy of the x from the global environment.
The easiest fix is to use a for loop:
for (y in 1:N) {
x[y,sample(N,1)]<- 1
}
apply series should be used for the return value, rather than programming functions with side-effects.
A way to do this is to return the rows, then rbind them into a matrix. The second example is shown here, as this more closely resembles an apply:
do.call(rbind, lapply((1:N), function(i) rnorm(N)))
However, this is more readable:
matrix(rnorm(N*N), N, N)
Now to scale this to have row sums equal to 1. You use the fact that a matrix is column-oriented and that vectors are recycled, meaning that you can divide a matrix M by rowSums(M). Using a more reasonable N=5:
m <- matrix(rnorm(N*N), N, N)
m/rowSums(m)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.1788692 0.5398464 0.24980924 -0.01282655 0.04430168
## [2,] 0.4176512 0.2564463 0.11553143 0.35432975 -0.14395871
## [3,] 0.3480568 0.7634421 -0.38433940 0.34175983 -0.06891932
## [4,] 1.1807180 -0.0192272 0.16500179 -0.31201400 -0.01447859
## [5,] 1.1601173 -0.1279919 -0.07447043 0.20865963 -0.16631458
No-loop solution :)
n <- 5
# on which column in each row insert 1s
s <- sample(n,n,TRUE)
# indexes for each row
w <- seq(1,n*n,by=n)-1
index <- s+w
# vector of 0s
vec <- integer(n*n)
# put 1s
vec[index] <- 1
# voila :)
matrix(vec,n,byrow = T)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 0 0 0
[2,] 0 0 0 1 0
[3,] 0 0 0 0 1
[4,] 1 0 0 0 0
[5,] 1 0 0 0 0