I tried the Page Rank function on a normal matrix with no Null Node. The ith row shows the node and the jth column shows the transition. So Matrix[i,j] denotes the transition from the ith row to the jth column
Transition Matrix
library(igraph)
#-----B is the matrix----#
g2<-graph_from_adjacency_matrix(B, mode = "directed" , weighted = TRUE)
plot(g2)
B1<-page.rank(g2, damping = 1)$vector
C1<-as.data.frame(B1)
This gave me the result(no damping):
The PageRank comes out to be (3/9, 2/9, 2/9, 2/9)
Now, I applied the same to another matrix with the Null node:
New Matrix with 3rd Row being the Null node
What I should get is a row vector of 0,0,0,0 but what I get on using the function is:
The PageRank comes out to be (0.2, 0.2666666,0.2666666,0.2666666)
What am I doing wrong?
As far as I understand PageRank it is not defined when there are nodes with an out-degree of zero (as you have here). According to this answer to a related question: How does pageranking algorithm deal with webpage without outbound links? this is commonly dealt with by connecting the offending node to all of the others (including itself) with equal probability.
I tried this out with your example
B1 <- structure(c(0, 0.5, 0.25, 0, 0.333333333333333, 0, 0.25, 0.5,
0.333333333333333, 0, 0.25, 0.5, 0.333333333333333, 0.5, 0.25,
0), .Dim = c(4L, 4L))
g1 <- graph_from_adjacency_matrix(B1, mode = "directed", weighted = TRUE)
page_rank(g1, damping = 1)$vector
and this gives
[1] 0.2000000 0.2666667 0.2666667 0.2666667
which is the same as you have.
[Updated from comments]
Under the hood igraph is using prpack so this must be accounting for nodes with zero out degree. If you want to flag up the problem before you run the page_rank function you could just check for:
any(degree(g1, mode = "out") == 0)
so actually get the zero vector that you want and preserve node names it could be something like:
outdeg <- degree(g1, mode = "out")
if (any(outdeg==0)) {
B2 <- 0*outdeg
} else {
B2 <- page_rank(g1, damping = 1)
}
or even smaller
B2 <- any(degree(g1, mode = "out") == 0) * page_rank(g1, damping = 1)
Related
I have a data frame with 3 columns and 40 rows. The 1st two columns contain a value range from -1 to 1, and the 3rd column contains the sum of the two columns. Therefore, I would like to change values closer to zero, such as 0.3, 0.2, 0.1, -0.1, -0.2, -0.3 in the 3rd columns to zero and the rest as it was.
library(dplyr)
set.seed(2)
D = data.frame(from = runif(40, -1,1), to = runif(40,-1,1)) %>% dplyr::mutate(weight = from + to)
I appreciate your help.
With replace, check if the absolute value is below a threshold:
thrs = 0.5
transform(D, weight = replace(weight, abs(weight) < thrs, 0))
Or in the dplyr framework:
D %>%
mutate(weight = replace(weight, abs(weight) < 0.5, 0))
Within some matrix algebra I found the expression B = ker(A), where A is a 3x4 transformation matrix. The following two links gave me some vague idea about ker() in general:
Wolfram: Kernel
Calculate the dimensions and basis of the kernel Ker(f)
But frankly, I still can not square how to get a 4x1 vector as result. How would this kernel be calculated in R? And some additional background/links would be appreciated.
Here is the matrix A and the result B (or its transpose...).
A = structure(c(0.9, 1.1, 1.2, 0.8, 0, 0.5, 0.3, 0.1, 0.5, 0, 0.2,
0.7), .Dim = 4:3)
B = structure(c(0.533, 0.452, -0.692, -0.183), .Dim = c(4L, 1L))
I did get as far as realizing, that each row of the A-matrix times B equals zero, just like in the examples. But for solving the set of linear equations I am missing one more equation, don't I?
With the pracma package:
pracma::nullspace(t(A))
# [,1]
# [1,] -0.5330006
# [2,] -0.4516264
# [3,] 0.6916801
# [4,] 0.1830918
With the MASS package:
MASS::Null(A)
I have been trying to troubleshoot a problem for the past couple of hours with no luck.
I have found one error message on another post that is similar here : Plotting Ellipse3d in R Plotly with surface ellipse
The only response highlights that it is because of a lack of samples. But my sample size = 3; which is the minimum for creating an ellipse.
Another response on another post highlights that it could be because the matrix being entered into the ellipse3d() is not a symmetric square matrix, but in fact it is because I take the cov(data_frame_of_interest) while entering it.
More about the problem:
I am trying to plot a 3d PCA using rgl package in R and prcomp(); where top_noSR3 is a prcomp() object. I am interested in the first 3 PC's for the following example.
#Here is the exact PCA object obtained from prcomp(). It is
#top_noSR3$x[,c(1,2,3)].
top_noSR3 <- list(x = structure(c(-1.51867921165966, 4.7156146538954, -0.812795773332441,
-2.38413966890339, -2.57305046487183, 8.4906222835565, 1.66680581870767,
-7.5843776373923, 3.35199279752431, 2.4452307290032, -7.33666903970592,
1.53944551317846, -1.72127982564219, 2.68563509110228, -2.75720515666432,
1.79284989120424, -0.962414945057766, -1.44498205066367, -0.270435252418625,
2.67783224814005, -0.780985238414212, 7.3028581587737, -1.3641698814598,
-5.15770303889969, 0.989625278145346, 0.101301456051966, 1.96037141973899,
-3.05129815393625, -2.29377625653868, -4.3601221831666, 5.9554046825257,
0.698493757179566, 2.3724852157055, 0.0505080025883657, -1.97013906903266,
-0.452854149261224), .Dim = c(12L, 3L), .Dimnames = list(c("SR1.4",
"SR1.5", "SR1.6", "SR1.7", "SR2.2", "SR2.3", "SR2.4", "SR2.5",
"CI1.4", "CI1.5", "CI1.6", "CI1.7"), c("PC1", "PC2", "PC3"))))
#col_object is an array of colours used for colouring the points in the PCA:
col_object <- c("blue","red","green","purple","blue","red",
"green","purple","blue","red","green","purple")
plot3d(top_noSR3$x[,c(1,2,3)], type = "s", size = 1, col = col_object)
#Divide object into 3 components for 3d ellipse plotting function
for_plot3d <- function(casetime){
return(top_noSR3$x[casetime, c(1,2,3)])
}
BASE <- c("SR1.4", "SR2.2", "CI1.4")
EA <- c("SR1.5", "SR2.3", "CI1.5")
LA <- c("SR1.6", "SR2.4", "CI1.6")
FU <- c("SR1.7", "SR2.5", "CI1.7")
BASE_for3d <- for_plot3d(BASE)
EA_for3d <- for_plot3d(EA)
LA_for3d <- for_plot3d(LA)
FU_for3d <- for_plot3d(FU)
#The following arguments used above (BASE,EA, etc.) ^ are
#timepoints/rownames of interest. length(BASE) = ... = length(FU) = 3
center4plot <- function(x){
k <- mean(x[,1])
h <- mean(x[,2])
z <- mean(x[,3])
return(c(k,h,z))
}
center_base <- center4plot(BASE_for3d)
center_EA <- center4plot(EA_for3d)
center_LA <- center4plot(LA_for3d)
center_FU <- center4plot(FU_for3d)
ellipse_base <- ellipse3d(cov(BASE_for3d),level = 0.95, centre=center_base)
ellipse_EA <- ellipse3d(cov(EA_for3d), level = 0.95, centre=center_EA)
#ERROR appears here, in ellipse_LA
ellipse_LA <- ellipse3d(cov(LA_for3d), level = 0.95, centre=center_LA)
ellipse_FU <- ellipse3d(cov(FU_for3d), level = 0.95, centre=center_FU)
I get the following message when i run the line with ellipse_LA:
ellipse3d() Error in chol.default(cov) : the leading minor of order 3 is not positive definite
Thank you all for your time and effort.
When I run the current version of the code, I get the error you saw here:
> ellipse_base <- ellipse3d(cov(BASE_for3d),level = 0.95, centre=center_base)
Error in chol.default(cov) :
the leading minor of order 3 is not positive definite
If I look at the eigenvalues of that matrix, I see that the message is correct:
> eigen(cov(BASE_for3d))
eigen() decomposition
$values
[1] 1.454977e+01 1.433775e+00 8.521084e-16
$vectors
[,1] [,2] [,3]
[1,] 0.81230516 0.5213444 0.2614581
[2,] 0.04362183 0.3927276 -0.9186197
[3,] 0.58159906 -0.7576048 -0.2962727
Notice that the smallest eigenvalue is approximately zero.
The BASE_for3d object has 3 points in it. They all lie in a plane (as any 3 points must). Thus the covariance matrix for them is rank 2, not rank 3. You need to increase BASE to include at least 4 rows not all in a plane, not just 3. For example,
BASE <- c("SR1.4", "SR2.2", "CI1.4", "SR1.5")
This results in the following:
BASE_for3d <- for_plot3d(BASE)
center_base <- center4plot(BASE_for3d)
ellipse_base <- ellipse3d(cov(BASE_for3d),level = 0.95, centre=center_base)
shade3d(ellipse_base, alpha = 0.2)
I am confused with the return of function get.basis(). For example,
lprec <- make.lp(0, 4)
set.objfn(lprec, c(1, 3, 6.24, 0.1))
add.constraint(lprec, c(0, 78.26, 0, 2.9), ">=", 92.3)
add.constraint(lprec, c(0.24, 0, 11.31, 0), "<=", 14.8)
add.constraint(lprec, c(12.68, 0, 0.08, 0.9), ">=", 4)
set.bounds(lprec, lower = c(28.6, 18), columns = c(1, 4))
set.bounds(lprec, upper = 48.98, columns = 4)
RowNames <- c("THISROW", "THATROW", "LASTROW")
ColNames <- c("COLONE", "COLTWO", "COLTHREE", "COLFOUR")
dimnames(lprec) <- list(RowNames, ColNames)
solve(lprec)
Then the basic variables are
> get.basis(lprec)
[1] -7 -2 -3
However, the solution is
> get.variables(lprec)
[1] 28.60000 0.00000 0.00000 31.82759
From the solution, it seems variable 1 and variable 4 are basis. Hence how does vector (-7, -2, -3) come from?
I am guessing it is from 3 constraints and 4 decision variables.
After I reviewed the simplex method for bounded variables, finally I understood how it happens. These two links are helpful. Example; Video
Come back to this problem, the structure is like
lpSolveAPI (R interface for lp_solve) would rewrite the constraint structure as the following format after adding appropriate slack variables. The first three columns are for slack variables. Hence, the return of get.basis(), which is -7,-2,-3, are column 7, 2, 3 that represent variable 4, slack variable 2 and 3.
With respect to this kind of LP with bounded variables, a variable could be nonbasic at either lower bound or upper bound. The return of get.basis(lp, nonbasic=TRUE) is -1,-4,-5,-6. Minus means these variables are at their lower bound. It means slack variable 1 = 0, variable 4 = 28.6, variable 5 = 0, variable 6 = 0.
Thus, the optimal solution is 28.6(nonbasic), 0(nonbasic), 0(nonbasic), 31.82(basic)
I would like to dynamically parameterize an array for a state-space model depending on how many states I choose.
I am doing this with a loop -
Q <- function(params,states) {
qmat <- matrix(0,statespace,statespace)
for (i in 1:statespace)
qmat[i,i] <- statshockvar(params[(i-1)*5+1], params[(i-1)*5+2],
params[(i-1)*5+3],states[i])
qmat
}
This function is called many times, as the point of the program is to optimize a paramter set. However, this function setup is slowing down the optimization phase very substantially because this function and a bunch of others like it keep getting called, and they keep redefining the arrays.
How can I define the arrays I need once, dynamically, with the relevant parameters as above, and then be able to call the matrix function with a new set of parameters for optimization?
Thanks!
Edit -
statespace is just an integer describing the number of states to use in the model, say 3/
statshockvar <- function(meanrev,longrun,sigma,sstate) {
longrun*sigma^2/(2*meanrev)*(1-exp(-longrun))^2+sigma^2/longrun*(exp(-longrun) -
exp(-2*longrun))*sstate
}
statshockvar - in this particular example is the discretized variance of a CIR model for the term structure
Edit 2 -
params looks like this - please note these are just arbitrary number
params = c(
0.3275,
0.07,
0.197,
0,
0.05,
0.01,
0.2,
0.3,
0,
0.05,
0.01,
0.1,
0.3,
0,
0.05)
states would be something like this -
states = c(0.07,0.07,0.07)
again these states are arbitrary.
Here's a solution:
Q <- function(params, states) {
diag(mapply(function(y, z) statshockvar(y[1], y[2], y[3], z),
lapply(seq(statespace), function(x) params[(x-1)*5 + 1:3]),
states))
}
Test with the example parameters:
Q(params, states)
[,1] [,2] [,3]
[1,] 0.002465305 0.00000000 0.000000000
[2,] 0.000000000 0.03424762 0.000000000
[3,] 0.000000000 0.00000000 0.009499883
Looking at the for loop,
for (i in 1:statespace)
qmat[i,i] <- statshockvar(params[(i-1)*5+1], params[(i-1)*5+2],
params[(i-1)*5+3],states[i])
If statshockvar is vectorized, you can simply write
diag(qmat) <- statshockvar(params[((1:statespace)-1)*5+1], params[((1:statespace)-1)*5+2], params[((1:statespace)-1)*5+3], states[1:statespace])
If it's not, see ?Vectorize to make it so