I have an all zero sparse matrix K1 with the dimensions (9x3). I wanted to replace certain values of this matrix with an another matrix. Also, instead of numerical indexing, I have used variable indexing to make it more dynamic. The codes are as follows -
n <- 3
library(Matrix)
K1 <- Matrix(0, n*n, n*(n-1)/2, sparse = TRUE)
for (i in 1:(n - 1)) {
K1[2 + (i - 1)*(n + 1):i*n,
1 + (i - 1)*(n - i/2):i*(n - i)*(i + 1)/2] <- diag(n - i)
}
However, it shows the error -
Error in replCmat4(x, i1 = if (iMi) 0:(di[1] - 1L) else .ind.prep2(i, :
too many replacement values
Sometimes this error as well -
Error in intI(i, n = di[margin], dn = dn[[margin]], give.dn = FALSE) :
index larger than maximal 9
But, when I run the Similar code in MATLAB, it runs perfectly. MATLAB code -
n = 3
K1 = sparse(n*n,n*(n-1)/2);
for i = 1:n-1
K1(2+(i-1)*(n+1):i*n,1+(i-1)*(n-i/2):i*n-i*(i+1)/2) = eye(n-i);
end
And the output which MATLAB gives is -
K1 =
(2,1) 1.00
(3,2) 1.00
(6,3) 1.00
Thus, above is my desired output as well.
Can someone tell what is going wrong when I am trying to execute the same in R.
I appreciate the help. Thanks.
Please put the index in a pair of braket, otherwise they may be
explained differently in R and Matlab.
K1[(2+(i-1)*(n+1)):(i*n), (1+(i-1)*(n-i/2)):(i*(n-i)*(i+1)/2)]
Related
Hey everyone, I have a large Matrix X with the dimensions (654x7095). I wanted to subset this matrix and replace the values of this subsetted matrix of X with another matrix which I have created. The R-code is as follows -
install.packages("Matrix")
install.packages("base")
library(Matrix)
library(base)
T = 215
n = 3
k = 33
X = matrix(0,T*n,T*k)
IN = diag(n)
K1 = Matrix(0, n*n, n*(n-1)/2, sparse = TRUE)
for(i in 1:(n-1)){
K1[(2+(i-1)*(n+1)):(i*n), (1+(i-1)*(n-i/2)):(i*(n-i)*(i+1)/2)] <- diag(n-i)
}
yin = matrix(rnorm(645), ncol = 3)
Xu = matrix(rnorm(2150), ncol = 10)
#Till yet I have defined the variables and matrices which will be used in subsetting.
Above codes are perfectly fine, however, the code below is showing error -
#Loop for X subsetting
for(i in 1:T){
X[(((i-1)*n)+1):(i*n), (((i-1)*k)+1):(i*k)] <- cbind( (t(kronecker(yin[i,],IN))%*%K1) , (t(kronecker(Xu[i,],IN))))
}
# in this Kronecker() finds the Kronecker tensor product of two Matrix A and B. This function can be used with the help of "base" library.
When I am running this above code, the error which is showing is -
Error in X[(((i - 1) * n) + 1):(i * n), ] <- cbind((t(kronecker(yin[i, :
number of items to replace is not a multiple of replacement length
However, when I am running this same command in MATLAB it is working perfectly fine. MATLAB CODE -
X = zeros(T*n,T*k);
for i = 1:T
X((i-1)*n+1:i*n,(i-1)*k+1:i*k) = [kron(yin(i,:),IN)*K1, kron(Xu(i,:),IN)];
end
The output which MATLAB is giving is that it fills up the values in number of rows and columns which is defined in the Loop for subsetting the X. I have attached the snapshot of the desired output which MATLAB is giving. However, error is showing in R for the same.
Can someone enlighten me as where I am going wrong with the R code?
I appreciate the help, Many thanks.
I think the problem is how the class 'dgeMatrix' is handled. Try
for (i in 1:T) {
X[(((i-1)*n)+1):(i*n), (((i-1)*k)+1):(i*k)] <- as.matrix(cbind((t(kronecker(yin[i,],IN))%*%K1) , (t(kronecker(Xu[i,],IN)))))
}
I want to convert the code in R to MATLAB (not to executing the R code in MATLAB).
The code in R is as follows:
data_set <- read.csv("lab01_data_set.csv")
# get x and y values
x <- data_set$x
y <- data_set$y
# get number of classes and number of samples
K <- max(y)
N <- length(y)
# calculate sample means
sample_means <- sapply(X = 1:K, FUN = function(c) {mean(x[y == c])})
# calculate sample deviations
sample_deviations <- sapply(X = 1:K, FUN = function(c) {sqrt(mean((x[y == c] - sample_means[c])^2))})
To implement it in MATLAB I write the following:
%% Reading Data
% read data into memory
X=readmatrix("lab01_data_set(ViaMatlab).csv");
% get x and y values
x_read=X(1,:);
y_read=X(2,:);
% get number of classes and number of samples
K = max(y_read);
N = length(y_read);
% Calculate sample mean - 1st method
% funct1 = #(c) mean(c);
% G1=findgroups(y_read);
% sample_mean=splitapply(funct1,x_read,G1)
% Calculate sample mean - 2nd method
for m=1:3
sample_mean(1,m)=mean(x(y_read == m));
end
sample_mean;
% Calculate sample deviation - 2nd method
for m=1:3
sample_mean=mean(x(y_read == m));
sample_deviation(1,m)=sqrt(mean((x(y_read == m)-sample_mean).^2));
sample_mean1(1,m)=sample_mean;
end
sample_deviation;
sample_mean1;
As you see I get how to use a for loop in MATLAB instead of sapply in R (as 2nd method in code), but do not know how to use a function (Possibly splitaplly or any other).
PS: Do not know how to upload the data, so sorry for that part.
The MATLAB equivalent to R sapply is arrayfun - and its relatives cellfun, structfun and varfun depending on what data type your input is.
For example, in R:
> sapply(1:3, function(x) x^2)
[1] 1 4 9
is equivalent to MATLAB:
>>> arrayfun(#(x) x^2, 1:3)
ans =
1 4 9
Note that if the result of the function you pass to arrayfun, cellfun etc. doesn't have identical type or size for every input, you'll need to specify 'UniformOutput', 'false' .
I'm trying to optimize spdep function of R for my use case since it is very slow for large databases. I was doing mostly fine but I stuck at one point, where I am trying to find trace of my weights matrix for LM error test. I think the formula is tr[(W' + W) W] (page 82 of Anselin, L., Bera, A. K., Florax, R. and Yoon, M. J. 1996 Simple diagnostic tests for spatial dependence. Regional Science and Urban Economics, 26, 77–104.) W is a square weights matrix, holding the spatial relation of each observation with another. tr() operation is the sum of the diagonals.
In my case, the weights matrix is symmetric and the diagonals are zero. So, I thought the formula tr[(W' + W) W] equals to 2*sumsq(W), which is super fast. But apparently I am mistaken somewhere because the results do not match the results of the spdep library, which is likely to be right.
The relevant part of the spdep library is here. Can anybody help me how the result of the following function differs from 2*sumsq(W) or how to make it much faster? This function is where the lm.LMtests function gets clogged for large data sets.
tracew <- function (listw) {
dlmtr <- 0
n <- length(listw$neighbours)
if (n < 1) stop("non-positive n")
ndij <- card(listw$neighbours)
dlmtr <- 0
for (i in 1:n) {
dij <- listw$neighbours[[i]]
wdij <- listw$weights[[i]]
for (j in seq(length=ndij[i])) {
k <- dij[j]
# Luc Anselin 2006-11-11 problem with asymmetric listw
dk <- which(listw$neighbours[[k]] == i)
if (length(dk) > 0L && dk > 0L &&
dk <= length(listw$neighbours[[k]]))
wdk <- listw$weights[[k]][dk]
else wdk <- 0
dlmtr <- dlmtr + (wdij[j]*wdij[j]) + (wdij[j]*wdk)
}
}
dlmtr
}
Additional explanation for those who are not familiar with spdep library of R:
The input of the function, listw, holds a "graph" implementation of the weight matrix with two list of lists. listw$neighbors is a list, where each list item is a list of the indices of observations for which the observation has a relation to. listw$weights a list of the same structure with neighbors, except that it holds the weights of the relation.
Thanks in advance for any comments and directions.
# example code
# initiliaze
library(spdep)
library(multiway)
# load the tracew function above
data(columbus)
columbus = columbus[rep(row.names(columbus), 20), ] # the difference becomes dramatic when n is high. try not replicating at first to see the results.
# manual calculation, using sumsq
w = distm(cbind(columbus$X, columbus$Y))
w[w > 1000000] = Inf # remove some relations acc. to pre-defined rule
w = 1/(1+w)
diag(w) = 0
w = w / (sum(w) / length(columbus$X)) #"C style" standardization
2*sumsq(w)
# spdep calculation
neighs.band = dnearneigh(cbind(columbus$X, columbus$Y), 0, 1000, longlat = TRUE)
w.spdep = lapply(nbdists(neighs.band, cbind(columbus$X, columbus$Y), longlat = TRUE), function(x) 1/(0.001+x))
my.listw = nb2listw(neighs.band, glist = w.spdep, style="C")
tracew(my.listw)
I would like to fill my created empty matrices with a loop:
First I have created my empty matrizes, which works fine:
for(q in (15:30)){
assign(paste0("P",q), matrix(, nrow = q, ncol = q+1))}
But now when I want to fill these matrices with my formula, I get an dimension mistake:
for(c in (1:q+1)){
for(i in (1:q)){assign(paste0("P",q)[i,c],
((((((q-c) + 1 -(q-c+1- i))/q)^.69)/(((((q-c) + 1 - (q-c+1-i))/q)^.69+(((1 - ((q-c) + 1 -(q-c+1-i))/q))^.69))^(1/.69))) - (((((q-c)-(q-c+1-i))/q)^.69)/(((((q-c) - (q-c+1-i))/q)^.69+(((1 - ((q-c)-(q-c+1-i))/q))^.69))^(1/.69)))))}}}
Nevertheless when I use this loop for a single matrix it works e.g.:
t <- 20
c <- 1
i <- 1
for(c in (1:t+1)){
for(i in (1:t)){P20[i,c]<-( (((((t-c) + 1 -(t-c+1-i))/t)^.69)/
(((((t-c) + 1 - (t-c+1-i))/t)^.69+(((1 - ((t-c) + 1 -(t-c+1-i))/t))^.69))^(1/.69))) -
(((((t-c)-(t-c+1-i))/t)^.69)/(((((t-c) - (t-c+1-i))/t)^.69+(((1 - ((t-c)-(t-c+1-i))/t))^.69))^(1/.69))))}}
The formula is giving out probability weights according to Cummulative Prospect Theory, if anyone is interested.
Do you guys have an idea how I can make this more elegant? Should I better write a user-defined function?
If you are happy with your resultant matrices being in a list with the same names you were assigning to you could do something like:
l = lapply(15:30, function(q){
t = q
matrix(apply(expand.grid(1:q,1:(q+1)),1,
function(x){
i = x[1]
c = x[2]
( (((((t-c) + 1 -(t-c+1-i))/t)^.69)/
(((((t-c) + 1 - (t-c+1-i))/t)^.69+(((1 - ((t-c) + 1 -(t-c+1-i))/t))^.69))^(1/.69))) -
(((((t-c)-(t-c+1-i))/t)^.69)/(((((t-c) - (t-c+1-i))/t)^.69+(((1 - ((t-c)-(t-c+1-i))/t))^.69))^(1/.69))))
}),nrow = q, ncol = q+1, byrow = TRUE)
})
names(l) = paste0("P",15:30)
I have used bits like t=q and i=x[1]; c=x[2] such that I could just copy paste your formula for probability.
What we are doing here is using lapply to loop over the given row numbers in your question, we then use expand.grid to give the pairs of indicies for all cells in the resultant vector. To the indicies we apply a function which given row i, column c calculates the probability according to your formula. The values are then cast as a matrix such that the result has the appropriate structure.
You end up with a list l of matrices with components called "P15", "P16", ...
Updated with code that addresses most of my questions.
I have a modest function to generate iterations of a MLE for a population estimate. (I know that iterations are poor form in R, but I am trying to show a nonlinear search procedure in detail, to accompany methods from an Excel spreadsheet).
n <- c(32,54,37,60,41) # number of captures
R <- c(32,36,6,13,5) # of marked fish returned to the population
fn <- function(x){
N = 97 #starting value of N
mle = matrix(0, nrow=x, ncol=8) #per suggestion
colnames(mle) = c("N","g.N","h.N","N1","g.N1","h.N1","delta.h","corr") #added column names
for (i in 1:x) {
g.N = prod(1-n/N)
h.N = N-sum(R)-N*g.N
N1 = N-1
g.N1 = prod(1-n/N1)
h.N1 = N1-sum(R)-N*g.N1
delta.h = h.N-h.N1
corr = -h.N/delta.h
#print(c(N,g.N,h.N,N1,g.N1,h.N1,delta.h,corr))#original output
mle[i,] = c(N,g.N,h.N,N1,g.N1,h.N1,delta.h,corr) #per suggestion
N = N+corr
}
return(mle) #per suggestion
}
fn(5)
This creates the following output
N g.N h.N N1 g.N1 h.N1 delta.h corr
[1,] 97.00000 0.04046356 1.075034e+00 96.00000 0.03851149 0.2643856 0.8106486 -1.326141e+00
[2,] 95.67386 0.03788200 4.954192e-02 94.67386 0.03597455 -0.7679654 0.8175073 -6.060119e-02
[3,] 95.61326 0.03776543 2.382189e-03 94.61326 0.03586008 -0.8154412 0.8178234 -2.912841e-03
[4,] 95.61035 0.03775983 1.147664e-04 94.61035 0.03585458 -0.8177238 0.8178386 -1.403289e-04
[5,] 95.61020 0.03775956 5.529592e-06 94.61020 0.03585432 -0.8178338 0.8178393 -6.761220e-06
I would like to cleanup the output, but have not been able to crack the code to put the results in a matrix or data.frame or any format where I can give column titles and adjust the digits, numeric format, etc. in a meaningful manner. I've have had limited success with cat and format but have been unable to get them to do precisely what I would like. Any help formatting this as a table, or matrix or data.frame would be appreciated.
Your function doesn't actually work for me (what's n for example). Anyway, you should have something like:
N<-97 #starting value of N
m = matrix(0, nrow=5, ncol=7)
for (i in 1:x) {
#<snip>
m[i,] = c(N,g.N,N1,g.N1,h.N1,delta.h,corr)
N<-N+corr
}
return(m)
}