please i'm trying to solve a 7x2 matrix problem in the form below using R- software:
A=array(c(5.54,0.96,1.59,2.07,0.73,10.64,8.28,1.41,3.77,3.11,3.74,2.93,8.29,3.33), c(7,2))
A
# [,1] [,2]
#[1,] 5.54 1.41
#[2,] 0.96 3.77
#[3,] 1.59 3.11
#[4,] 2.07 3.74
#[5,] 0.73 2.93
#[6,] 10.64 8.29
#[7,] 8.28 3.33
b=c(80814.25,34334.75,47921.75,59514.25,26981.25,63010.25,46646.25)
b
#[1] 80814.25 34334.75 47921.75 59514.25 26981.25 63010.25 46646.25
solve (A,b)
Error in solve.default(A, b) : 'a' (7 x 2) must be square
A %*% solve (A,b)
Error in solve.default(A, b) : 'a' (7 x 2) must be square
What do you think I can do to solve the problem. I need solution to two variables, x1 and x2, in the 7x2 matrix as stated above.
It seems that you're using solve when it needs a square input. In ?solve it discusses how you can use qr.solve for non-square matrices.
qr.solve(A,b)
[,1]
[1,] 3741.208
[2,] 6552.174
You might want to check that this is correct for your purposes. There are other ways to solve these types of problems. This might help you though.
The corpcor package offers a pseudoinverse function for finding the inverse of a rectangular matrix:
library(corpcor)
pseudoinverse(A)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.06271597 -0.05067830 -0.02922597 -0.03265713 -0.03964039 0.0230086
[2,] -0.05845856 0.08551514 0.05661287 0.06532450 0.06674243 0.0391552
[,7]
[1,] 0.07239133
[2,] -0.05420334
pseudoinverse(A) %*% b
[,1]
[1,] 3741.208
[2,] 6552.174
Related
I'm using the R package geigen to solve the generalized eigenvalue problem AV = lambdaB*V.
This is the code:
geigen(Gamma_chi_0, diag(diag(Gamma_xi_0)),symmetric=TRUE, only.values=FALSE) #GENERALIZED EIGENVALUE PROBLEM
Where:
Gamma_chi_0
[,1] [,2] [,3] [,4] [,5]
[1,] 1.02346 -0.50204 0.41122 -0.73066 0.00072
[2,] -0.50204 0.96712 -0.33526 0.51774 -0.37708
[3,] 0.41122 -0.33526 1.05086 0.09798 0.09274
[4,] -0.73066 0.51774 0.09798 0.99780 -0.51596
[5,] 0.00072 -0.37708 0.09274 -0.51596 1.03354
and
diag(diag(Gamma_xi_0))
[,1] [,2] [,3] [,4] [,5]
[1,] -0.0234 0.0000 0.0000 0.0000 0.0000
[2,] 0.0000 0.0329 0.0000 0.0000 0.0000
[3,] 0.0000 0.0000 -0.0509 0.0000 0.0000
[4,] 0.0000 0.0000 0.0000 0.0022 0.0000
[5,] 0.0000 0.0000 0.0000 0.0000 -0.0335
But I get this error:
> geigen(Gamma_chi_0, diag(diag(Gamma_xi_0)), only.values=FALSE)
Error in .sygv_Lapackerror(z$info, n) :
Leading minor of order 1 of B is not positive definite
In matlab, using the same two matrices, it works:
opt.disp = 0;
[P, D] = eigs(Gamma_chi_0, diag(diag(Gamma_xi_0)),r,'LM',opt);
% compute first r generalized eigenvectors and eigenvalues
For example I get the following eigenvalues matrix
D =
427.8208 0
0 -38.6419
Of course in matlab I just computed the first r=2, in R i want all the eigenvalues and eigenvectors (n=5), and then i subset the first 2.
Can someone help me to solve this?
geigen has detected a symmetric matrix for Gamma_chi_0. Then Lapack encounters an error and cannot continue. Specify symmetric=FALSE in the call of geigen. The manual describes what argument symmetric does. Do this
geigen(Gamma_chi_0, B, symmetric=FALSE, only.values=FALSE)
The result is (on my computer)
$values
[1] 4.312749e+02 -3.869203e+01 -2.328465e+01 1.706288e-05 1.840783e+01
$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] -0.067535068 1.0000000 0.2249715 -0.89744514 0.05194799
[2,] -0.035746438 0.1094176 0.3273440 0.03714518 1.00000000
[3,] 0.005083806 0.3782606 0.8588086 0.50306323 0.17858115
[4,] -1.000000000 0.2986963 0.4067701 -1.00000000 -0.48314183
[5,] -0.034226056 -0.6075727 1.0000000 -0.53017872 0.06738515
$alpha
[1] 1.365959e+00 -1.152686e+00 -9.202769e-01 4.352770e-07 5.588102e-01
$beta
[1] 0.003167259 0.029791306 0.039522893 0.025510167 0.030357208
This is quite close to what you show for Matlab. I know nothing about Matlab so I cannot help you with that.
Addendum
Matlab seems to use similar methods as geigen when the matrices used are determined to be symmetric or not. Your matrix Gamma_chi_0 may not be exactly symmetric. See this documentation for argument 'algorithm' of eig.
More addendum
In actual fact your matrix B is not positive definite. Try the function chol of base R. And you'll get the same error message. In this case you have to force geigen to use the general algorithm.
I have this dataset:
dbppre dbppost per1pre per1post per2pre per2post
0.544331824055634 0.426482748529805 1.10388140870983 1.14622255457398 1.007302668 1.489675646
0.44544008292805 0.300746382647025 0.891104906479033 0.876840408251785 0.919450773 0.892276804
0.734783578764543 0.489971007532308 1.02796075709944 0.79655130374748 0.610340504 0.936092006
1.04113077142586 0.386513119551008 0.965359488375859 1.04314173155816 1.122001994 0.638452078
0.333368637355291 0.525460160226716 NA 0.633435747 1.196988457 0.396543005
1.76769244892893 0.726077921840058 1.08060419667991 0.974269083108835 1.245643507 1.292857474
1.41486783 NA 0.910710353033318 1.03435985624106 0.959985314 1.244732938
1.01932795229362 0.624195252685448 1.27809687379565 1.59656046306852 1.076534265 0.848544508
1.3919315726037 0.728230610741795 0.817900465495852 1.24505216554384 0.796182044 1.47318564
1.48912544220417 0.897585509143984 0.878534099910696 1.12148645028777 1.096723799 1.312244217
1.56801709691326 0.816474814896344 1.13655475536592 1.01299018097117 1.226607978 0.863016615
1.34144721808244 0.596169010679233 1.889775937 NA 1.094095173 1.515202105
1.17409999971024 0.626873517936125 0.912837009713984 0.814632450682884 0.898149331 0.887216585
1.06862027138743 0.427855128881696 0.727537839417515 1.15967069522768 0.98168375 1.407271061
1.50406121956726 0.507362673558659 1.780752715 0.658835953 2.008229626 1.231869338
1.44980944220763 0.620658801480513 0.885827192590202 0.651268425772394 1.067548223 0.994736445
1.27975202574336 0.877955236879164 0.595981804265367 0.56002696152466 0.770642278 0.519875921
0.675518080750329 0.38478948746306 0.822745530980815 0.796051785239611 1.16899539 1.16658889
0.839686262472682 0.481534573379965 0.632380676760052 0.656052506855686 0.796504954 1.035781891
.
.
.
As you can see, there are multiple cuantitative variables for gene expression data, each gene meassured two times, pre and post treatment, with some missing values in some of the variables.
Each row corresponds to one individual, and they are also divided in two groups (0 = control, 1 = really treated).
I would like to make a correlation (Spearman or Pearson depending on normality, but by group, and obtaining the correlation value and the p-value significance, avoiding the NAs.
Is it possible?
I know how to implement cor.test() function to compare two variables, but I could not find any variable inside this function to take groups into account.
I also discovered plyr and data.table libraries to do so, by groups, but they return just the correlation value without p-value, and I haven't been able to make it word for variables with NAs.
Suggestions?
You could use the Hmisc package.
library(Hmisc)
set.seed(10)
dt<-matrix(rnorm(100),5,5) #create matrix
dt[1,1]<-NA #introduce NAs
dt[2,4]<-NA #introduce NAs
cors<-rcorr(dt, type="spearman") #spearman correlation
corp<-rcorr(dt, type="pearson") #pearson correlation
> corspear
[,1] [,2] [,3] [,4] [,5]
[1,] 1.0 0.4 0.2 0.5 -0.4
[2,] 0.4 1.0 0.1 -0.4 0.8
[3,] 0.2 0.1 1.0 0.4 0.1
[4,] 0.5 -0.4 0.4 1.0 -0.8
[5,] -0.4 0.8 0.1 -0.8 1.0
n
[,1] [,2] [,3] [,4] [,5]
[1,] 4 4 4 3 4
[2,] 4 5 5 4 5
[3,] 4 5 5 4 5
[4,] 3 4 4 4 4
[5,] 4 5 5 4 5
P
[,1] [,2] [,3] [,4] [,5]
[1,] 0.6000 0.8000 0.6667 0.6000
[2,] 0.6000 0.8729 0.6000 0.1041
[3,] 0.8000 0.8729 0.6000 0.8729
[4,] 0.6667 0.6000 0.6000 0.2000
[5,] 0.6000 0.1041 0.8729 0.2000
For further details see the help section: ?rcorr
rcorr returns a list with elements r, the matrix of correlations, n
the matrix of number of observations used in analyzing each pair of
variables, and P, the asymptotic P-values. Pairs with fewer than 2
non-missing values have the r values set to NA.
Sorry, newbie...
I've got an array object called "y" of 500 matrices of 6x6, like this:
, , 1
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.0000 0.3627 0.4132 0.4231 0.3795 0.5444
[2,] 0.3627 0.0000 0.2084 0.3523 0.2310 0.5377
[3,] 0.4132 0.2084 0.0000 0.1984 0.2920 0.4774
[4,] 0.4231 0.3523 0.1984 0.0000 0.2787 0.4363
[5,] 0.3795 0.2310 0.2920 0.2787 0.0000 0.5129
[6,] 0.5444 0.5377 0.4774 0.4363 0.5129 0.0000
[...]
, , 500
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.0000 0.3755 0.3568 0.3835 0.3582 0.5065
[2,] 0.3755 0.0000 0.0840 0.2253 0.2237 0.4066
[3,] 0.3568 0.0840 0.0000 0.1673 0.2434 0.4073
[4,] 0.3835 0.2253 0.1673 0.0000 0.2338 0.3403
[5,] 0.3582 0.2237 0.2434 0.2338 0.0000 0.4263
[6,] 0.5065 0.4066 0.4073 0.3403 0.4263 0.0000
I want to extract a specific position through all the 500 matrices in the array and store this 500 values in a vector named "unouno" for further analyses
I'm trying to do this:
for (i in 1:dim(y)[[3]]){
unouno<-y[2,1,i, drop=F]
}
but it only extracts the value for the last (500th) matrix.
(Once solved this I want to extract and store separately the 500 values of each of the 6 x 6 positions in the matrices)
We can do this by leaving the 3rd dimension blank
y[2,1,]
data
y <- array(1:60, dim=c(10,2,3))
If you would like to fix your loop, this could be one way to do it:
unouno <- NULL
for (i in 1:dim(y)[3]){
unouno[i]<-y[2,1,i]
}
It seems that you were mising indexing on the vector unouno as well
I wish to create a number of tables with numerical elements of very different scales. For example, tables with the variance of various variables down the diagonal, and the correlations off the diagonal. The larger numbers make the tables too big, and are harder to read.
Is there a good way using stargazer (or some other, similar package) to scale down the elements that are much larger, and indicate this with a foot note, or automatically use exponential notation?
For example, the following r code creates a matrix where the diagonal elements are much larger than the other elements.
x <- matrix(rnorm(25,0,1),5,5)
diag(x) <- rnorm(5,10000000,10)
stargazer(x,summary=F,digits=2)
Any help much appreciated.
Maybe you can do some adaptation of this:
ifelse(x < 100, sprintf("%0.2f", x), sprintf("%0.5e", x))
# [,1] [,2] [,3] [,4] [,5]
#[1,] "9.99999e+06" "-0.79" "-0.56" "0.91" "-2.57"
#[2,] "-0.13" "9.99999e+06" "-1.83" "-0.34" "1.73"
#[3,] "-0.48" "0.38" "1.00000e+07" "1.40" "-0.32"
#[4,] "-0.05" "-0.62" "0.91" "1.00000e+07" "1.15"
#[5,] "-0.09" "-0.33" "-0.16" "0.35" "9.99999e+06"
Or, without quotes:
noquote(ifelse(x < 100, sprintf("%0.2f", x), sprintf("%0.5e", x)))
# [,1] [,2] [,3] [,4] [,5]
#[1,] 9.99999e+06 -0.79 -0.56 0.91 -2.57
#[2,] -0.13 9.99999e+06 -1.83 -0.34 1.73
#[3,] -0.48 0.38 1.00000e+07 1.40 -0.32
#[4,] -0.05 -0.62 0.91 1.00000e+07 1.15
#[5,] -0.09 -0.33 -0.16 0.35 9.99999e+06
You are essentially converting to text for printing in your desired way. For more info on output options, see ?sprintf.
I searched about poly() in R and I think it should produce orthogonal polynomials so when we use it in regression model like lm(y~poly(x,2)) the predictors are uncorrelated. However:
poly(1:3,2)=
[1,] -7.071068e-01 0.4082483
[2,] -7.850462e-17 -0.8164966
[3,] 7.071068e-01 0.4082483
I think this is probably a stupid question but what I don't understand is the column vectors of the result poly(1:3,2) does not have inner product zero? That is -7.07*0.40-7.85*(-0.82)+7.07*0.41=/ 0? so how is this uncorrelated predictors for regression?
Your main problem is that you're missing the meaning of the e or "E notation": as commented by #MamounBenghezal above, fffeggg is shorthand for fff * 10^(ggg)
I get slightly different answers than you do (the difference is numerically trivial) because I'm running this on a different platform:
pp <- poly(1:3,2)
## 1 2
## [1,] -7.071068e-01 0.4082483
## [2,] 4.350720e-18 -0.8164966
## [3,] 7.071068e-01 0.4082483
An easier format to see:
print(zapsmall(matrix(c(pp),3,2)),digits=3)
## [,1] [,2]
## [1,] -0.707 0.408
## [2,] 0.000 -0.816
## [3,] 0.707 0.408
sum(pp[,1]*pp[,2]) ## 5.196039e-17, effectively zero
Or to use your example, with the correct placement of decimal points:
-0.707*0.408-(7.85e-17)*(-0.82)+(0.707)*0.408
## [1] 5.551115e-17