I am with this doubt, but honestly I do not know if the solution really exists in R.
I have a graph x/y, and I want to draw two straight lines, (1) from the x-axis to the data, and another (2) from the y-axis to the data. Line 1, I have the value of it, would be the tertile of my data. The question is, how to find the exact point at which the line intersects the given and plot by following the y-axis?
I have already tried, by the position of the x-axis, to use the same position for y. This even works for some data, but not all (since the values do not always match).
Here is my example
ob<-c(77.89824, 170.36929, 90.88129, 141.22368, 174.07871,
106.51393, 94.32576, 85.31712, 78.95808, 222.30143, 115.25760,
85.84704, 165.33504, 72.06912, 38.94912, 90.88129, 167.18976,
125.85600, 141.22367, 104.65922, 131.95009, 81.07777,
64.12032,130.36032, 89.29152, 65.97504, 40.27392, 64.38529,
113.40288)
tm<-c(38.94912, 40.27392, 64.12032, 64.38529, 65.97504, 72.06912,
77.89824, 78.95808, 81.07777, 85.31712, 85.84704, 89.29152,
90.88129, 94.32576, 104.65922, 106.51393, 113.40288, 115.25760,
125.85600, 130.36032, 131.95009, 141.22367, 141.22368, 165.33504,
167.18976, 170.36929, 174.07871)
bs<-c(0.96523390, 0.93066061, 0.89634466, 0.86213300, 0.82769878,
0.79311455, 0.75831596, 0.72311471, 0.68800759, 0.65245482,
0.61700818, 0.58163643, 0.51021060, 0.47393336, 0.43788203,
0.40203755, 0.36614804, 0.33059801, 0.29408090, 0.25820874,
0.22265365, 0.18803136, 0.15444785, 0.11931985, 0.08411248,
0.05098459, 0.01957279)
prc<-c(0.956974397, 0.914559074, 0.872836231, 0.831624652,
0.790544222, 0.749700646, 0.709038330, 0.668364230, 0.628275180,
0.588180704, 0.548730882, 0.509909531, 0.433282681, 0.395329802,
0.358306283, 0.322222487, 0.286868665, 0.252670119, 0.218461386,
0.185847964, 0.154593177, 0.125303855, 0.098121311, 0.071199383,
0.046104574, 0.024746731, 0.007529233)
plot(tm,bs,type="l",col="red")
lines(tm,prc,col="black")
tinf<-quantile(ob,prob=1/3)
tsup<-quantile(ob,prob=2/3)
idxinf<-which(tm>=(tinf-5) & tm<=(tinf+5))
infgrafico<-mean(prc[idxinf])
idxsup<-which(tm>=(tsup-5) & tm<=(tsup+5))
supgrafico<-mean(prc[idxsup])
segments(tinf,0.03, tinf,infgrafico,col='black',lty=3,lwd=1)
segments(min(tm),infgrafico,
tinf,infgrafico,col='black',lty=3,lwd=1)
text(tinf,cex=1,y=0,col="black",font=2,"T1")
segments(tsup,0.03, tsup,supgrafico,col='black',lty=3,lwd=1)
segments(min(tm),supgrafico,
tsup,supgrafico,col='black',lty=3,lwd=1)
text(tsup,cex=1,y=0,col="black",font=2,"T2")
But this is it, sometimes the values are not corresponding and are not found, causing the straight lines do not cross in the value of the data. And yes, I would need something more automated as possible, as I have to save those values on a table, and could not do it on the hand / trial and error one by one.
Thanks!
EDITED AFTER ONE ANSWER
TavoGLC or who wants, can you help in one more think? I have problems in some cases of my data.
tm <-c (54.05184, 67.29985, 70.86991, 78.42816, 80.84780, 80.54784, 80.81280, 81.8774, 89.82144, 89.82144, 90.81314, 90.35136, 92.20607, 92.47104, 97.50528, 97.77025, 99.09504, 99.88993, 100.41985, 100.94976, 101.74465, 102.27456, 105.45408 , 105.71905, 116.05248, 118.43713, 122.94144, 125.06112, NA)
prc <-c (0.9454304, 0.9604309, 0.9604309, 0.9608306, 0.9608306, 0.9608309, 0.9608309, 0.9608309, 0.9604309, 0.9605930, , 0.4163839, 0.3624935, 0.3041409, 0.2327866, 0.1079731, NA)
tercil <-89.821
plot (tm, prc, type = "l")
abline (v = 89.821, col = "red")
Considering that the line ('' abline (h = ...) '') has a very large range of location, when I do the procedure of "MakeLineCoords" obtaining values of "2.627424", not agreeing with my data (max=1). I have a series of tm and prc data of [360,181,29], this example above is one of the errors I have cut. But if you want I can send you the complete test data.
This is the adaptation I made to work here.
yTinf = array (NA, c (360,181,29))
xTinf = array (NA, c (360,181,29))
for (i in 1: 360) {
for (j in 1: 181)
for (k in 1:29)
yTinf [i, j, k] <- tercil [i, j, k]
xTinf [i, j, k] <- MakeLineCoords (prc [i, j,], tm [i, j,], yTinf [i, j, k])
}}}
Even so, it presents some values greater than 1 and some extrapolated, in the order of 2000 and 3000. Which from what I realized, it would be due to the above problem, where xTinf contains some correct values.
Thank you so much!
You just have to create a function that finds the closest point in the data and make an interpolation from that point.
MakeLineCoords <- function(Xvals,Yvals,targetPoint) {
x <- vector(mode="numeric", length=length(Yvals))
for(k in 1:length(Xvals)){
x[k]<-(Yvals[k]-targetPoint)^2
}
mL=which.min(x)
xVal=Xvals[mL]+(targetPoint-Yvals[mL])*((Xvals[mL+1]-Xvals[mL])/(Yvals[mL+1]-Yvals[mL]))
return(xVal)
}
Then you can define a target value to be found and create the coordinates for the lines in the plot.
yTarget<-0.25
xTarget<-MakeLineCoords(tm,bs,yTarget)
plot(tm,bs,type="l",col="red")
segments(xTarget,0.0,xTarget,yTarget,col='black',lty=3,lwd=1)
segments(min(tm),yTarget,xTarget,yTarget,col='black',lty=3,lwd=1)
By using your example data i get the following. Hope it helps
tinf<-quantile(ob,prob=1/3)
yTarget<-tinf
xTarget<-MakeLineCoords(bs,tm,yTarget)
plot(tm,bs,type="l",col="red")
segments(yTarget,0.0,yTarget,xTarget,col='black',lty=3,lwd=1)
segments(min(tm),xTarget,yTarget,xTarget,col='black',lty=3,lwd=1)
I have no trouble implementing a code to find the biggest eigenvalue, and corresponding eigenvector of a matrix using the power method.
What I have more trouble with, is thinking of a code that can output all eigenvalues and eigenvectors of a given matrix at once. I am able to do it manually on a small matrix, but can't seem to properly generalize it.
I suspect it can be done in a beautiful way with some recursion but I'd need some help on that.
EDIT: Also I don't have trouble finding all the eigenvalues either, it's the eigenvectors that cause me trouble
Here would be the code that does it manually:
#Example matrix
test.set=matrix(0,4,4)
test.set[1,]=c(-2,2,1,5)
test.set[2,]=c(2,5,8,8)
test.set[3,]=c(4,2,6,3)
test.set[4,]=c(5,-2,4,9)
The function to get one Eigenvalue/Eigenvector
#Power method simple : return biggest egeinvalue and corresponding eigenvector
power_method_simple=function(A,n_rep) {
#Initialize with a random column of the matrix
b_0=A[,sample(1:ncol(A),size=1)]
for (k in 1:n_rep) {
b_0=A%*%b_0
b_0_norm=sqrt(t(b_0)%*%b_0)
b_0=b_0/b_0_norm[1,1]
print(b_0)
}
eigenvalue=(t(b_0)%*%A%*%b_0)[1,1]
res_list=list(b_0,eigenvalue)
names(res_list)=c("vector","eigenvalue")
return(res_list)
}
Now the example by hand:
#################
#By hand
#First
res1=power_method_simple(test.set,n_rep=1000)
first_ev=res1$vector
first_value=res1$eigenvalue
#Second
residual_matrix1=test.set-first_value*first_ev%*%t(first_ev)
res2=power_method_simple(residual_matrix1,n_rep=1000)
second_value=res2$eigenvalue
second_ev=(second_value-first_value)*res2$vector + first_value*((t(first_ev)%*%res2$vector)[1,1])*first_ev
second_ev=second_ev/sqrt((t(second_ev)%*%second_ev)[1,1])
#Third
residual_matrix2=residual_matrix1-second_value*res2$vector%*%t(res2$vector)
res3=power_method_simple(residual_matrix2,n_rep=1000)
third_value=res3$eigenvalue
u3=(third_value-second_value)*res3$vector + second_value*((t(res2$vector)%*%res3$vector)[1,1])*res2$vector
u3=u3/sqrt((t(u3)%*%u3)[1,1])
third_ev=(third_value-first_value)*u3 + first_value*((t(first_ev)%*%u3)[1,1])*first_ev
third_ev=third_ev/sqrt((t(third_ev)%*%third_ev)[1,1])
#I works for first three
print(eigen(test.set)$vector)
print(cbind(first_ev,second_ev,third_ev))
I am using the answer to the following question to do this:
Answer to: Power method for finding all eigenvectors
How to make a clean function that does everything at one out of that?
A recursive function like this seems to work:
find_vals=function(matrix, values=list(), vectors=list(), evs=list(), n=nrow(matrix)) {
if (n<1) return(list(values, evs))
res=power_method_simple(matrix,n_rep=1000)
curr_val = res$eigenvalue
res_v = res$vector
i = nrow(matrix) - n + 1
values[i] = curr_val
vectors[[i]] = res_v
if (i == 1) {
evs[[i]] = res_v
} else {
curr_v = vectors[[i]]
for (k in (i-1):1) {
curr_v = (values[[i]] - values[[k]])*curr_v + values[[k]]*((t(vectors[[k]])%*%curr_v)[1,1])*vectors[[k]]
curr_v=curr_v/sqrt((t(curr_v)%*%curr_v)[1,1])
}
evs[[i]] = curr_v
}
matrix=matrix-curr_val*res_v%*%t(res_v)
return (find_vals(matrix, values, vectors, evs, n-1))
}
I need to identify the rows (/columns) that have defined values in a large sparse Boolean Matrix. I want to use this to 1. slice (actually view) the Matrix by those rows/columns; and 2. slice (/view) vectors and matrices that have the same dimensions as the margins of a Matrix. I.e. the result should probably be a Vector of indices / Bools or (preferably) an iterator.
I've tried the obvious:
a = sprand(10000, 10000, 0.01)
cols = unique(a.colptr)
rows = unique(a.rowvals)
but each of these take like 20ms on my machine, probably because they allocate about 1MB (at least they allocate cols and rows). This is inside a performance-critical function, so I'd like the code to be optimized. The Base code seems to have an nzrange iterator for sparse matrices, but it is not easy for me to see how to apply that to my case.
Is there a suggested way of doing this?
Second question: I'd need to also perform this operation on views of my sparse Matrix - would that be something like x = view(a,:,:); cols = unique(x.parent.colptr[x.indices[:,2]]) or is there specialized functionality for this? Views of sparse matrices appear to be tricky (cf https://discourse.julialang.org/t/slow-arithmetic-on-views-of-sparse-matrices/3644 – not a cross-post)
Thanks a lot!
Regarding getting the non-zero rows and columns of a sparse matrix, the following functions should be pretty efficient:
nzcols(a::SparseMatrixCSC) = collect(i
for i in 1:a.n if a.colptr[i]<a.colptr[i+1])
function nzrows(a::SparseMatrixCSC)
active = falses(a.m)
for r in a.rowval
active[r] = true
end
return find(active)
end
For a 10_000x10_000 matrix with 0.1 density it takes 0.2ms and 2.9ms for cols and rows, respectively. It should also be quicker than method in question (apart from the correctness issue as well).
Regarding views of sparse matrices, a quick solution would be to turn view into a sparse matrix (e.g. using b = sparse(view(a,100:199,100:199))) and use functions above. In code:
nzcols(b::SubArray{T,2,P}) where {T,P<:AbstractSparseArray} = nzcols(sparse(b))
nzrows(b::SubArray{T,2,P}) where {T,P<:AbstractSparseArray} = nzrows(sparse(b))
A better solution would be to customize the functions according to view. For example, when the view uses UnitRanges for both rows and columns:
# utility predicate returning true if element of sorted v in range r
inrange(v,r) = searchsortedlast(v,last(r))>=searchsortedfirst(v,first(r))
function nzcols(b::SubArray{T,2,P,Tuple{UnitRange{Int64},UnitRange{Int64}}}
) where {T,P<:SparseMatrixCSC}
return collect(i+1-start(b.indexes[2])
for i in b.indexes[2]
if b.parent.colptr[i]<b.parent.colptr[i+1] &&
inrange(b.parent.rowval[nzrange(b.parent,i)],b.indexes[1]))
end
function nzrows(b::SubArray{T,2,P,Tuple{UnitRange{Int64},UnitRange{Int64}}}
) where {T,P<:SparseMatrixCSC}
active = falses(length(b.indexes[1]))
for c in b.indexes[2]
for r in nzrange(b.parent,c)
if b.parent.rowval[r] in b.indexes[1]
active[b.parent.rowval[r]+1-start(b.indexes[1])] = true
end
end
end
return find(active)
end
which work faster than the versions for the full matrices (for 100x100 submatrix of above 10,000x10,000 matrix cols and rows take 16μs and 12μs, respectively on my machine, but these are unstable results).
A proper benchmark would use fixed matrices (or at least fix the random seed). I'll edit this line with such a benchmark if I do it.
In case the indices are not ranges, the fallback to converting to a sparse matrix works, but here are versions for indices which are Vectors. If the indices are mixed, yet another set of versions needs to be made. Quite repetitive, but this is the strength of Julia, when the versions are done, the code will choose optimized methods correctly using the types in the caller without too much effort.
function sortedintersecting(v1, v2)
i,j = start(v1), start(v2)
while i <= length(v1) && j <= length(v2)
if v1[i] == v2[j] return true
elseif v1[i] > v2[j] j += 1
else i += 1
end
end
return false
end
function nzcols(b::SubArray{T,2,P,Tuple{Vector{Int64},Vector{Int64}}}
) where {T,P<:SparseMatrixCSC}
brows = sort(unique(b.indexes[1]))
return [k
for (k,i) in enumerate(b.indexes[2])
if b.parent.colptr[i]<b.parent.colptr[i+1] &&
sortedintersecting(brows,b.parent.rowval[nzrange(b.parent,i)])]
end
function nzrows(b::SubArray{T,2,P,Tuple{Vector{Int64},Vector{Int64}}}
) where {T,P<:SparseMatrixCSC}
active = falses(length(b.indexes[1]))
for c in b.indexes[2]
active[findin(b.indexes[1],b.parent.rowval[nzrange(b.parent,c)])] = true
end
return find(active)
end
-- ADDENDUM --
Since it was noted nzrows for Vector{Int} indices is a bit slow, this is an attempt to improve its speed by replacing findin with a version exploiting sortedness:
function findin2(inds,v,w)
i,j = start(v),start(w)
res = Vector{Int}()
while i<=length(v) && j<=length(w)
if v[i]==w[j]
push!(res,inds[i])
i += 1
elseif (v[i]<w[j]) i += 1
else j += 1
end
end
return res
end
function nzrows(b::SubArray{T,2,P,Tuple{Vector{Int64},Vector{Int64}}}
) where {T,P<:SparseMatrixCSC}
active = falses(length(b.indexes[1]))
inds = sortperm(b.indexes[1])
brows = (b.indexes[1])[inds]
for c in b.indexes[2]
active[findin2(inds,brows,b.parent.rowval[nzrange(b.parent,c)])] = true
end
return find(active)
end
I am m trying to run the following code:
data_greene<-read.delim(file.choose(),header=T)
result_b2_HC0<-matrix(1:2000,ncol=4)
for (i in 1:500){
X1<-data_greene[[3]]*10^-4
X2<-X1^2
e<-rnorm(50,0,1)
sigma2<-exp(5.30+5.30*X1)
lambda<-max(sigma2)/min(sigma2)
Y<-1+1*X1+0*X2+sqrt(sigma2)*e
lms<-lmsreg(Y~X1+X2)
yhat<-lms$fitted
resid<-lms$residual
s<-abs(resid)
lms2<-lmsreg(s~yhat)
shat<-lms2$fitted
w1<-1/shat^2
scale<-lms$scale[1]
stdres<-resid/scale
e=abs(stdres)
w2<-NULL
for (i in 1:50){
if(e[i]<=1.345) w2[i]<-1 else w2[i]<-1.345/e[i]
}
w<-w1*w2
WLS<-lm(Y~X1+X2,weights=w)
res1<-WLS$residual
HCCMEHC0<-function(Y,X1,X2){
X<-cbind(1,X1,X2)
W<-diag(w)
inv<-solve(t(X)%*%W%*%X)
psi0<-diag(res1^2)
HC0<-inv%*%t(X)%*%W%*%psi0%*%W%*%X%*%inv
return(HC0)
}
result_b2_HC0[i,1]<-WLS$coef[3]
result_b2_HC0[i,2]<-sqrt(HCCMEHC0(Y,X1,X2)[3,3])
result_b2_HC0[i,3]<-result_b2_HC0[i,1]/result_b2_HC0[i,2]
result_b2_HC0[i,4]<-2*pt(-abs(result_b2_HC0[i,3]),df=47)
}
result_b2_HC0
I would expect the matrix to be complete, but the result only appears at row 50 in the matrix. What am I doing wrong?
You are using the same variable i in two nested for loops. Change the second for loop to use the variable j instead.
To avoid this error, make sure you always use indentation. Also, learn how to use vector mathematics. Your second loop can be rewritten from
e=abs(stdres)
w2<-NULL
for (i in 1:50){
if(e[i]<=1.345) w2[i]<-1 else w2[i]<-1.345/e[i]
}
to
e=abs(stdres)
w2<-ifelse( e <= 1.345, 1, 1.345/e )
This is cleaner, easier to read, and faster.