Matrix Error in R - r

I'm writing a function in R to take a number and perform an operation on every row of the matrix. Each matrix should have 50 elements, ncol(x) is 50. Here's the calculation I'm performing
function(v1,v2){
a <- sum(v1*v2)
b <- sqrt(sum(v1*v1))* sqrt(sum(v2*v2))
return(a/b)
}
Here's a sample of my data:
the 0.41800 0.249680 -0.41242 0.121700 0.345270 -0.044457 -0.49688 -0.178620 -0.00066023 -0.656600 0.278430 -0.14767 -0.55677 0.14658 -0.0095095
. 0.15164 0.301770 -0.16763 0.176840 0.317190 0.339730 -0.43478 -0.310860 -0.44999000 -0.294860 0.166080 0.11963 -0.41328 -0.42353 0.5986800
of 0.70853 0.570880 -0.47160 0.180480 0.544490 0.726030 0.18157 -0.523930 0.10381000 -0.175660 0.078852 -0.36216 -0.11829 -0.83336 0.1191700
to 0.68047 -0.039263 0.30186 -0.177920 0.429620 0.032246 -0.41376 0.132280 -0.29847000 -0.085253 0.171180 0.22419 -0.10046 -0.43653 0.3341800
and 0.26818 0.143460 -0.27877 0.016257 0.113840 0.699230 -0.51332 -0.473680 -0.33075000 -0.138340 0.270200 0.30938 -0.45012 -0.41270 -0.0993200
in 0.33042 0.249950 -0.60874 0.109230 0.036372 0.151000 -0.55083 -0.074239 -0.09230700 -0.328210 0.095980 -0.82269 -0.36717 -0.67009 0.4290900`
Here's what I'm trying to write, it's pretty straightforward:
function(w,m) {
r=matrix(nrow=nrow(m),ncol=2)
for(i in 1:(nrow(m)-1)) {
r[i,1]=cosineSim(m[w,],m[i,])
}
r=sort(r,decending=TRUE)
return (r)
}
1)Create a new matrix with the same columns and rows as matrix m.
2)Loop through m-1 rows.
3)Each row, use cosineSim to compute the cosine similarity of the current vector (row) of the loop.
4)Once the end of the loop is finished, sort and return r which is the array holding the result.
The problem is that the algorithm seems to enter an infinite loop.
I need this to work for a project and would very much appreciate any help.

Thanks to the suggestion of TMin, I was able to realize the problem in my code. Also I figured out that it's not stuck in an infinite loop, rather scanning ~20M elements and taking a long time doing it.

Related

R - graphics cross lines

I am with this doubt, but honestly I do not know if the solution really exists in R.
I have a graph x/y, and I want to draw two straight lines, (1) from the x-axis to the data, and another (2) from the y-axis to the data. Line 1, I have the value of it, would be the tertile of my data. The question is, how to find the exact point at which the line intersects the given and plot by following the y-axis?
I have already tried, by the position of the x-axis, to use the same position for y. This even works for some data, but not all (since the values ​​do not always match).
Here is my example
ob<-c(77.89824, 170.36929, 90.88129, 141.22368, 174.07871,
106.51393, 94.32576, 85.31712, 78.95808, 222.30143, 115.25760,
85.84704, 165.33504, 72.06912, 38.94912, 90.88129, 167.18976,
125.85600, 141.22367, 104.65922, 131.95009, 81.07777,
64.12032,130.36032, 89.29152, 65.97504, 40.27392, 64.38529,
113.40288)
tm<-c(38.94912, 40.27392, 64.12032, 64.38529, 65.97504, 72.06912,
77.89824, 78.95808, 81.07777, 85.31712, 85.84704, 89.29152,
90.88129, 94.32576, 104.65922, 106.51393, 113.40288, 115.25760,
125.85600, 130.36032, 131.95009, 141.22367, 141.22368, 165.33504,
167.18976, 170.36929, 174.07871)
bs<-c(0.96523390, 0.93066061, 0.89634466, 0.86213300, 0.82769878,
0.79311455, 0.75831596, 0.72311471, 0.68800759, 0.65245482,
0.61700818, 0.58163643, 0.51021060, 0.47393336, 0.43788203,
0.40203755, 0.36614804, 0.33059801, 0.29408090, 0.25820874,
0.22265365, 0.18803136, 0.15444785, 0.11931985, 0.08411248,
0.05098459, 0.01957279)
prc<-c(0.956974397, 0.914559074, 0.872836231, 0.831624652,
0.790544222, 0.749700646, 0.709038330, 0.668364230, 0.628275180,
0.588180704, 0.548730882, 0.509909531, 0.433282681, 0.395329802,
0.358306283, 0.322222487, 0.286868665, 0.252670119, 0.218461386,
0.185847964, 0.154593177, 0.125303855, 0.098121311, 0.071199383,
0.046104574, 0.024746731, 0.007529233)
plot(tm,bs,type="l",col="red")
lines(tm,prc,col="black")
tinf<-quantile(ob,prob=1/3)
tsup<-quantile(ob,prob=2/3)
idxinf<-which(tm>=(tinf-5) & tm<=(tinf+5))
infgrafico<-mean(prc[idxinf])
idxsup<-which(tm>=(tsup-5) & tm<=(tsup+5))
supgrafico<-mean(prc[idxsup])
segments(tinf,0.03, tinf,infgrafico,col='black',lty=3,lwd=1)
segments(min(tm),infgrafico,
tinf,infgrafico,col='black',lty=3,lwd=1)
text(tinf,cex=1,y=0,col="black",font=2,"T1")
segments(tsup,0.03, tsup,supgrafico,col='black',lty=3,lwd=1)
segments(min(tm),supgrafico,
tsup,supgrafico,col='black',lty=3,lwd=1)
text(tsup,cex=1,y=0,col="black",font=2,"T2")
But this is it, sometimes the values ​​are not corresponding and are not found, causing the straight lines do not cross in the value of the data. And yes, I would need something more automated as possible, as I have to save those values ​​on a table, and could not do it on the hand / trial and error one by one.
Thanks!
EDITED AFTER ONE ANSWER
TavoGLC or who wants, can you help in one more think? I have problems in some cases of my data.
tm <-c (54.05184, 67.29985, 70.86991, 78.42816, 80.84780, 80.54784, 80.81280, 81.8774, 89.82144, 89.82144, 90.81314, 90.35136, 92.20607, 92.47104, 97.50528, 97.77025, 99.09504, 99.88993, 100.41985, 100.94976, 101.74465, 102.27456, 105.45408 , 105.71905, 116.05248, 118.43713, 122.94144, 125.06112, NA)
prc <-c (0.9454304, 0.9604309, 0.9604309, 0.9608306, 0.9608306, 0.9608309, 0.9608309, 0.9608309, 0.9604309, 0.9605930, , 0.4163839, 0.3624935, 0.3041409, 0.2327866, 0.1079731, NA)
tercil <-89.821
plot (tm, prc, type = "l")
abline (v = 89.821, col = "red")
Considering that the line ('' abline (h = ...) '') has a very large range of location, when I do the procedure of "MakeLineCoords" obtaining values ​​of "2.627424", not agreeing with my data (max=1). I have a series of tm and prc data of [360,181,29], this example above is one of the errors I have cut. But if you want I can send you the complete test data.
This is the adaptation I made to work here.
yTinf = array (NA, c (360,181,29))
xTinf = array (NA, c (360,181,29))
for (i in 1: 360) {
for (j in 1: 181)
for (k in 1:29)
yTinf [i, j, k] <- tercil [i, j, k]
xTinf [i, j, k] <- MakeLineCoords (prc [i, j,], tm [i, j,], yTinf [i, j, k])
}}}
Even so, it presents some values ​​greater than 1 and some extrapolated, in the order of 2000 and 3000. Which from what I realized, it would be due to the above problem, where xTinf contains some correct values.
Thank you so much!
You just have to create a function that finds the closest point in the data and make an interpolation from that point.
MakeLineCoords <- function(Xvals,Yvals,targetPoint) {
x <- vector(mode="numeric", length=length(Yvals))
for(k in 1:length(Xvals)){
x[k]<-(Yvals[k]-targetPoint)^2
}
mL=which.min(x)
xVal=Xvals[mL]+(targetPoint-Yvals[mL])*((Xvals[mL+1]-Xvals[mL])/(Yvals[mL+1]-Yvals[mL]))
return(xVal)
}
Then you can define a target value to be found and create the coordinates for the lines in the plot.
yTarget<-0.25
xTarget<-MakeLineCoords(tm,bs,yTarget)
plot(tm,bs,type="l",col="red")
segments(xTarget,0.0,xTarget,yTarget,col='black',lty=3,lwd=1)
segments(min(tm),yTarget,xTarget,yTarget,col='black',lty=3,lwd=1)
By using your example data i get the following. Hope it helps
tinf<-quantile(ob,prob=1/3)
yTarget<-tinf
xTarget<-MakeLineCoords(bs,tm,yTarget)
plot(tm,bs,type="l",col="red")
segments(yTarget,0.0,yTarget,xTarget,col='black',lty=3,lwd=1)
segments(min(tm),xTarget,yTarget,xTarget,col='black',lty=3,lwd=1)

Code for power method to find all eigenvalues and eigenvectors ( in R)

I have no trouble implementing a code to find the biggest eigenvalue, and corresponding eigenvector of a matrix using the power method.
What I have more trouble with, is thinking of a code that can output all eigenvalues and eigenvectors of a given matrix at once. I am able to do it manually on a small matrix, but can't seem to properly generalize it.
I suspect it can be done in a beautiful way with some recursion but I'd need some help on that.
EDIT: Also I don't have trouble finding all the eigenvalues either, it's the eigenvectors that cause me trouble
Here would be the code that does it manually:
#Example matrix
test.set=matrix(0,4,4)
test.set[1,]=c(-2,2,1,5)
test.set[2,]=c(2,5,8,8)
test.set[3,]=c(4,2,6,3)
test.set[4,]=c(5,-2,4,9)
The function to get one Eigenvalue/Eigenvector
#Power method simple : return biggest egeinvalue and corresponding eigenvector
power_method_simple=function(A,n_rep) {
#Initialize with a random column of the matrix
b_0=A[,sample(1:ncol(A),size=1)]
for (k in 1:n_rep) {
b_0=A%*%b_0
b_0_norm=sqrt(t(b_0)%*%b_0)
b_0=b_0/b_0_norm[1,1]
print(b_0)
}
eigenvalue=(t(b_0)%*%A%*%b_0)[1,1]
res_list=list(b_0,eigenvalue)
names(res_list)=c("vector","eigenvalue")
return(res_list)
}
Now the example by hand:
#################
#By hand
#First
res1=power_method_simple(test.set,n_rep=1000)
first_ev=res1$vector
first_value=res1$eigenvalue
#Second
residual_matrix1=test.set-first_value*first_ev%*%t(first_ev)
res2=power_method_simple(residual_matrix1,n_rep=1000)
second_value=res2$eigenvalue
second_ev=(second_value-first_value)*res2$vector + first_value*((t(first_ev)%*%res2$vector)[1,1])*first_ev
second_ev=second_ev/sqrt((t(second_ev)%*%second_ev)[1,1])
#Third
residual_matrix2=residual_matrix1-second_value*res2$vector%*%t(res2$vector)
res3=power_method_simple(residual_matrix2,n_rep=1000)
third_value=res3$eigenvalue
u3=(third_value-second_value)*res3$vector + second_value*((t(res2$vector)%*%res3$vector)[1,1])*res2$vector
u3=u3/sqrt((t(u3)%*%u3)[1,1])
third_ev=(third_value-first_value)*u3 + first_value*((t(first_ev)%*%u3)[1,1])*first_ev
third_ev=third_ev/sqrt((t(third_ev)%*%third_ev)[1,1])
#I works for first three
print(eigen(test.set)$vector)
print(cbind(first_ev,second_ev,third_ev))
I am using the answer to the following question to do this:
Answer to: Power method for finding all eigenvectors
How to make a clean function that does everything at one out of that?
A recursive function like this seems to work:
find_vals=function(matrix, values=list(), vectors=list(), evs=list(), n=nrow(matrix)) {
if (n<1) return(list(values, evs))
res=power_method_simple(matrix,n_rep=1000)
curr_val = res$eigenvalue
res_v = res$vector
i = nrow(matrix) - n + 1
values[i] = curr_val
vectors[[i]] = res_v
if (i == 1) {
evs[[i]] = res_v
} else {
curr_v = vectors[[i]]
for (k in (i-1):1) {
curr_v = (values[[i]] - values[[k]])*curr_v + values[[k]]*((t(vectors[[k]])%*%curr_v)[1,1])*vectors[[k]]
curr_v=curr_v/sqrt((t(curr_v)%*%curr_v)[1,1])
}
evs[[i]] = curr_v
}
matrix=matrix-curr_val*res_v%*%t(res_v)
return (find_vals(matrix, values, vectors, evs, n-1))
}

Identify which rows (or columns) have values in sparse Matrix

I need to identify the rows (/columns) that have defined values in a large sparse Boolean Matrix. I want to use this to 1. slice (actually view) the Matrix by those rows/columns; and 2. slice (/view) vectors and matrices that have the same dimensions as the margins of a Matrix. I.e. the result should probably be a Vector of indices / Bools or (preferably) an iterator.
I've tried the obvious:
a = sprand(10000, 10000, 0.01)
cols = unique(a.colptr)
rows = unique(a.rowvals)
but each of these take like 20ms on my machine, probably because they allocate about 1MB (at least they allocate cols and rows). This is inside a performance-critical function, so I'd like the code to be optimized. The Base code seems to have an nzrange iterator for sparse matrices, but it is not easy for me to see how to apply that to my case.
Is there a suggested way of doing this?
Second question: I'd need to also perform this operation on views of my sparse Matrix - would that be something like x = view(a,:,:); cols = unique(x.parent.colptr[x.indices[:,2]]) or is there specialized functionality for this? Views of sparse matrices appear to be tricky (cf https://discourse.julialang.org/t/slow-arithmetic-on-views-of-sparse-matrices/3644 – not a cross-post)
Thanks a lot!
Regarding getting the non-zero rows and columns of a sparse matrix, the following functions should be pretty efficient:
nzcols(a::SparseMatrixCSC) = collect(i
for i in 1:a.n if a.colptr[i]<a.colptr[i+1])
function nzrows(a::SparseMatrixCSC)
active = falses(a.m)
for r in a.rowval
active[r] = true
end
return find(active)
end
For a 10_000x10_000 matrix with 0.1 density it takes 0.2ms and 2.9ms for cols and rows, respectively. It should also be quicker than method in question (apart from the correctness issue as well).
Regarding views of sparse matrices, a quick solution would be to turn view into a sparse matrix (e.g. using b = sparse(view(a,100:199,100:199))) and use functions above. In code:
nzcols(b::SubArray{T,2,P}) where {T,P<:AbstractSparseArray} = nzcols(sparse(b))
nzrows(b::SubArray{T,2,P}) where {T,P<:AbstractSparseArray} = nzrows(sparse(b))
A better solution would be to customize the functions according to view. For example, when the view uses UnitRanges for both rows and columns:
# utility predicate returning true if element of sorted v in range r
inrange(v,r) = searchsortedlast(v,last(r))>=searchsortedfirst(v,first(r))
function nzcols(b::SubArray{T,2,P,Tuple{UnitRange{Int64},UnitRange{Int64}}}
) where {T,P<:SparseMatrixCSC}
return collect(i+1-start(b.indexes[2])
for i in b.indexes[2]
if b.parent.colptr[i]<b.parent.colptr[i+1] &&
inrange(b.parent.rowval[nzrange(b.parent,i)],b.indexes[1]))
end
function nzrows(b::SubArray{T,2,P,Tuple{UnitRange{Int64},UnitRange{Int64}}}
) where {T,P<:SparseMatrixCSC}
active = falses(length(b.indexes[1]))
for c in b.indexes[2]
for r in nzrange(b.parent,c)
if b.parent.rowval[r] in b.indexes[1]
active[b.parent.rowval[r]+1-start(b.indexes[1])] = true
end
end
end
return find(active)
end
which work faster than the versions for the full matrices (for 100x100 submatrix of above 10,000x10,000 matrix cols and rows take 16μs and 12μs, respectively on my machine, but these are unstable results).
A proper benchmark would use fixed matrices (or at least fix the random seed). I'll edit this line with such a benchmark if I do it.
In case the indices are not ranges, the fallback to converting to a sparse matrix works, but here are versions for indices which are Vectors. If the indices are mixed, yet another set of versions needs to be made. Quite repetitive, but this is the strength of Julia, when the versions are done, the code will choose optimized methods correctly using the types in the caller without too much effort.
function sortedintersecting(v1, v2)
i,j = start(v1), start(v2)
while i <= length(v1) && j <= length(v2)
if v1[i] == v2[j] return true
elseif v1[i] > v2[j] j += 1
else i += 1
end
end
return false
end
function nzcols(b::SubArray{T,2,P,Tuple{Vector{Int64},Vector{Int64}}}
) where {T,P<:SparseMatrixCSC}
brows = sort(unique(b.indexes[1]))
return [k
for (k,i) in enumerate(b.indexes[2])
if b.parent.colptr[i]<b.parent.colptr[i+1] &&
sortedintersecting(brows,b.parent.rowval[nzrange(b.parent,i)])]
end
function nzrows(b::SubArray{T,2,P,Tuple{Vector{Int64},Vector{Int64}}}
) where {T,P<:SparseMatrixCSC}
active = falses(length(b.indexes[1]))
for c in b.indexes[2]
active[findin(b.indexes[1],b.parent.rowval[nzrange(b.parent,c)])] = true
end
return find(active)
end
-- ADDENDUM --
Since it was noted nzrows for Vector{Int} indices is a bit slow, this is an attempt to improve its speed by replacing findin with a version exploiting sortedness:
function findin2(inds,v,w)
i,j = start(v),start(w)
res = Vector{Int}()
while i<=length(v) && j<=length(w)
if v[i]==w[j]
push!(res,inds[i])
i += 1
elseif (v[i]<w[j]) i += 1
else j += 1
end
end
return res
end
function nzrows(b::SubArray{T,2,P,Tuple{Vector{Int64},Vector{Int64}}}
) where {T,P<:SparseMatrixCSC}
active = falses(length(b.indexes[1]))
inds = sortperm(b.indexes[1])
brows = (b.indexes[1])[inds]
for c in b.indexes[2]
active[findin2(inds,brows,b.parent.rowval[nzrange(b.parent,c)])] = true
end
return find(active)
end

R:data simulation

I am m trying to run the following code:
data_greene<-read.delim(file.choose(),header=T)
result_b2_HC0<-matrix(1:2000,ncol=4)
for (i in 1:500){
X1<-data_greene[[3]]*10^-4
X2<-X1^2
e<-rnorm(50,0,1)
sigma2<-exp(5.30+5.30*X1)
lambda<-max(sigma2)/min(sigma2)
Y<-1+1*X1+0*X2+sqrt(sigma2)*e
lms<-lmsreg(Y~X1+X2)
yhat<-lms$fitted
resid<-lms$residual
s<-abs(resid)
lms2<-lmsreg(s~yhat)
shat<-lms2$fitted
w1<-1/shat^2
scale<-lms$scale[1]
stdres<-resid/scale
e=abs(stdres)
w2<-NULL
for (i in 1:50){
if(e[i]<=1.345) w2[i]<-1 else w2[i]<-1.345/e[i]
}
w<-w1*w2
WLS<-lm(Y~X1+X2,weights=w)
res1<-WLS$residual
HCCMEHC0<-function(Y,X1,X2){
X<-cbind(1,X1,X2)
W<-diag(w)
inv<-solve(t(X)%*%W%*%X)
psi0<-diag(res1^2)
HC0<-inv%*%t(X)%*%W%*%psi0%*%W%*%X%*%inv
return(HC0)
}
result_b2_HC0[i,1]<-WLS$coef[3]
result_b2_HC0[i,2]<-sqrt(HCCMEHC0(Y,X1,X2)[3,3])
result_b2_HC0[i,3]<-result_b2_HC0[i,1]/result_b2_HC0[i,2]
result_b2_HC0[i,4]<-2*pt(-abs(result_b2_HC0[i,3]),df=47)
}
result_b2_HC0
I would expect the matrix to be complete, but the result only appears at row 50 in the matrix. What am I doing wrong?
You are using the same variable i in two nested for loops. Change the second for loop to use the variable j instead.
To avoid this error, make sure you always use indentation. Also, learn how to use vector mathematics. Your second loop can be rewritten from
e=abs(stdres)
w2<-NULL
for (i in 1:50){
if(e[i]<=1.345) w2[i]<-1 else w2[i]<-1.345/e[i]
}
to
e=abs(stdres)
w2<-ifelse( e <= 1.345, 1, 1.345/e )
This is cleaner, easier to read, and faster.

Multivariate Optimization keeps returning same initial values

I have the code
INJ.1<-"I01 I02 I03 I04 I05
2.78E+02 1.82E+03 3.62E+02 2.90E+02 7.73E+02
7.92E+02 1.21E+03 9.33E+02 6.32E+02 5.10E+02
2.30E+03 7.54E+02 9.60E+02 6.29E+02 1.05E+03
3.61E+03 3.05E+02 7.77E+02 5.87E+02 1.02E+03
3.89E+02 1.35E+03 7.66E+02 4.00E+02 7.43E+02
1.31E+03 1.63E+03 8.95E+02 3.85E+02 1.10E+02
1.39E+03 1.16E+03 9.07E+02 4.99E+02 2.48E+02
1.94E+03 1.09E+03 8.34E+02 5.22E+02 2.48E+02
2.04E+03 1.11E+03 7.85E+02 2.67E+02 4.27E+02
1.06E+03 1.36E+03 8.80E+02 6.13E+02 7.16E+02
1.40E+03 1.29E+03 8.65E+02 6.17E+02 9.79E+02
1.20E+03 1.68E+03 6.78E+02 6.10E+02 9.30E+02
1.45E+03 1.49E+03 7.66E+02 3.81E+02 1.07E+03
1.16E+03 1.58E+03 1.09E+03 5.33E+02 8.38E+02
1.33E+03 1.38E+03 9.10E+02 6.29E+02 8.80E+02
"
INJ<-as.matrix(read.table(text=INJ.1, header=T))
PRD.1<-"P01
981.32019
1062.5702
1439.7673
1694.0723
1085.1016
1243.6089
1191.5941
1302.2167
1333.5266
1242.0212
1342.6954
1371.2767
1394.1171
1400.7926
1373.1791
"
PRD<-as.matrix(read.table(text=PRD.1, header=T))
tao=as.matrix(c(1,1,1,1,1))
lambda=as.matrix(c(0.0251879,0.1599486,0.1812318,0.2626731,0.3355733,0.3221295,-1.3343501))
i.dash=matrix(ncol=ncol(INJ), nrow=(nrow(INJ)))
fn1 <- function (tao){
for (i in 1:ncol(INJ))
for (j in 1:nrow (INJ))
temp=0
for (k in 1:j)
i.dash[j,i]=(1/tao[i])*exp((k-j)/tao[i])*INJ[k,i]+i.dash[j,i]
target = abs(700-sum(colSums(i.dash)))
}
ini=c(1, 1, 1, 1, 1)
ans1<-optim(par=ini,fn1,hessian=TRUE)
I need to optimize the values of tao as shown in the function. The code keeps giving the same initial values in in addition to that I noticed that the matrix calculation inside the function fn1 wasn't done. I have more than one question in addition to the main question which is how can I solve this case to achieve the min of o target:
Can we issue non calculation commands inside the function for example: assigning and creating matrices, vectors operations and manipulations..etc?
Are these changes going to be available after we exit the function?
In my case I am using the parameters values in some calculation firstly to prepare the objective function and then I do the optimization on them is that an acceptable approach in R?
I would like some one to give me as much as a starting point to start optimizing this function.

Resources