Suppose the following system of equations Ax = b with:
> A <- matrix(c(2,0,-1,0,0,2,2,1,-1,2,0,0,0,1,0,0), ncol = 4)
> A
[,1] [,2] [,3] [,4]
[1,] 2 0 -1 0
[2,] 0 2 2 1
[3,] -1 2 0 0
[4,] 0 1 0 0
> b <- c(-2,5,0,0)
Solving these equations with solve() yields:
> x <- solve(A,b)
> x
[1] 6.66e-16 4.44e-16 2.00e+00 1.00e+00
This is just an example, but A and b can be of any form.
I need to detect whether any component of x is 0. Now, the first two components should actually be 0, but they are both higher than the machine epsilon .Machine$double.eps = 2.22e-16 which makes them very small, but not equal to zero.
I think I understand that this is caused by rounding errors in floating point arithmetic inside solve(). What I need to know is whether it is possible (from a practical point of view) to determine the higher bound of these errors, so 0s can be detected. For example, instead of
> x == 0
[1] FALSE FALSE FALSE FALSE
one would use something like this:
> x > -1e-15 & x < 1e-15
[1] TRUE TRUE FALSE FALSE
Giving more insight into this problem would be appreciated.
One way to approach this is to check if we can find a better solution to the linear system if we assume the components to be zero. For that we would want to solve A[3:4]%*%y=b since A%*%c(0,0,x[3],x[4])=A[3:4]%*%c(x[3],x[4]). This is an overdetermined system so we can't use solve to find a solution. We can however use qr.solve:
> x.new = c(0,0,qr.solve(A[,3:4],b))
It remains to check if this solution is really better:
> norm(A%*%x.new - b) < norm(A%*%x - b)
[1] TRUE
Thus we have a good reason to suspect that x[1]==x[2]==0.
In this simple example it is obviously possible to guess the true solution by looking at the approximate solution:
> x.true = c(0,0,2,1)
> norm(A%*%x.true - b)
[1] 0
This is however not very helpful in the general case.
Related
In this question was found a solution to find a particular solution to a non-square linear system that has infinitely many solutions. This leads to another question:
How to find all the solutions for a non-square linear system with infinitely many solutions, with R? (see below for a possible description of the infinite set of solutions)
Example: the linear system
x+y+z=1
x-y-2z=2
is equivalent to A X = B with:
A=matrix(c(1,1,1,1,-1,-2),2,3,T)
B=matrix(c(1,2),2,1,T)
A
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 -1 -2
B
[,1]
[1,] 1
[2,] 2
We can describe the infinite set of solutions with:
x = 3/2 + (1/2) z
y = -1/2 + (-3/2) z
z in R
Thus, R could describe the set of solutions this way:
> solve2(A,B)
$principal
[1] 1 2 # this means that x and y will be described
$free
[1] 3 # this means that the 3rd variable (i.e. z) is free in the set of real numbers
$P
[1] 1.5 -0.5
$Q
[1] 0.5 -1.5
This means that every solution can be created with:
z = 236782 # any value would be ok
solve2(A,B)$P + z * solve2(A,B)$Q # this gives x and y
About the maths: there always exist such a decomposition, when the linear system has infinitely many solutions, this part is ok. The question is: is there something to do this in R?
You can solve equations like thse using the generalized inverse of A.
library(MASS)
ginv(A) %*% B
# 1.2857143
# 0.1428571
#-0.4285714
A %*% ginv(A) %*% B
# 1
# 2
So, with help from #Bhas
gen_soln <- function(vec) {
G <- ginv(A)
W <- diag(3) - G %*% A
(G %*% B + W %*% vec)
}
You can now find many solutions by providing a vector of length 3 to `gen_soln' function. For example,
one_from_inf <- gen_soln(1:3)
one_from_inf
#[1,] 1.35714286
#[2,] -0.07142857
#[3,] -0.2857142
# Test the solution.
A %*% one_from_inf
# [,1]
#[1,] 1
#[2,] 2
# Using random number generator
A %*% gen_soln(rnorm(3))
# [,1]
#[1,] 1
#[2,] 2
The general solution to
A*x = b
is
x = x0 + z
where x0 is any solution and z is in the kernel of A
As pointed out above you can find a particular solution x0 by using the generalised inverse. You can also use the SVD to find a basis for the kernel of A:
A = U*S*V'
where U and V are orthogonal and S diagonal, with, say, the last k entries on the diagonal 0 (and the others non-zero).
If follows that the last k columns of V form a basis for the kernel of A, and if we call these z1,..zk then the solutions of the original equation
are
x = x0 + c1*z1 + .. ck*zk
for any real c1..ck
I'm trying to use constrOptim to optimize the sum of square errors from a linear multiple regression. The main equation should be D = Beta1*Xa+Beta2*Xb+Beta3*Xc+Beta4*Xd , with D,Xa,Xb,Xc,Xd from a imported .csv file, and the Betas are the coefficients I want to find, minimizing the quadratic errors.
So far I imported the file.csv to R, named each column as Ds,Xa,Xb,Xc,Xd, created the objfunction=
function(Beta1,Beta2,Beta3,Beta4)'sum(E²)'=(sum(D) - sum(Beta1*Xa+Beta2*Xb+Beta3*Xc+Beta4*Xd))^2)
created the matrix 'C' and vector 'd' to configure the constraints that should restrict the Beta's to <=0. I dont know how to find the feasible region, although I've used initial values that made the function work.
Here is the code:
> Tabela= read.table("Simulacao.csv", header=T, sep= ";")
> Tabela
D A B C D.1
1 -1 1 -1 0 0
2 4 0 0 1 -1
3 4 1 0 -1 0
4 0 0 1 0 -1
5 -2 1 0 0 -1
> Ds= Tabela[,1]
> Xa= Tabela[,2]
> Xb= Tabela[,3]
> Xc= Tabela[,4]
> Xd= Tabela[,5]
> simulaf= function(x1,x2,x3,x4) {
+ Ds= Tabela[,1]
+ Xa= Tabela[,2]
+ Xb= Tabela[,3]
+ Xc= Tabela[,4]
+ Xd= Tabela[,5]
+ J=sum(Ds)
+ H=sum(x1*Xa+x2*Xb+x3*Xc+x4*Xd)
+ sx=(J-H)^2
+ return(sx)
+ }
> s= function(x) {simulaf(x[1],x[2],x[3],x[4])}
> d= c(0,0,0,0)
> C= matrix(c(-1,0,0,0,0,-1,0,0,0,0,-1,0,0,0,0,-1),nrow=4,ncol=4,byrow=T)
> constrOptim(c(-1,-1,-1,-1),s,NULL,C,d)
$par
[1] -0.2608199 -0.8981110 -1.1095961 -1.9274866
The result I expect should be:
$par
[1] -0.125 0 -0.5 -0.875
After researching this, my conclusions are that it could be because I'm using bad initial values, parameterization problem (don't understand why its needed) or if it's simply that I have programmed it incorrectly.
What do I need to do to fix this?
The formula for the sum of squared errors is
sum((y - yhat)^2)
and not
(sum(y) - sum(yhat))^2
where yhat is the predicted value.
Also, if your only constraints are that the estimated betas should be negative (which is a bit weird, usually you want them to be positive but never mind), then you don't need constrOptim. Regular optim(method="L-BFGS-B") or nlminb will work with so-called box constraints.
Presently, I am working through the above in the RStudio help file, which contains the following sample:
##
## rbprobitGibbs example
##
if(nchar(Sys.getenv("LONG_TEST")) != 0) {R=2000} else {R=10}
set.seed(66)
simbprobit = function(X,beta) {
## function to simulate from binary probit including x variable
y=ifelse((X%*%beta+rnorm(nrow(X)))<0,0,1)
list(X=X,y=y,beta=beta)
}
nobs=200
X=cbind(rep(1,nobs),runif(nobs),runif(nobs))
beta=c(0,1,-1)
nvar=ncol(X)
simout=simbprobit(X,beta)
Data1=list(X=simout$X,y=simout$y)
Mcmc1=list(R=R,keep=1)
out=rbprobitGibbs(Data=Data1,Mcmc=Mcmc1)
summary(out$betadraw,tvalues=beta)
if(0){
## plotting example
plot(out$betadraw,tvalues=beta)
}
When I step through the code, I don't see anywhere that the A matrix is set. It is only when I reach this line:
out=rbprobitGibbs(Data=Data1,Mcmc=Mcmc1)
That I see the A matrix displayed in the output, which I understand has to be a k * k matrix, where betabar is k * 1 matrix.
Prior Parms:
betabar
# [1] 0 0 0
A
# [,1] [,2] [,3]
# [1,] 0.01 0.00 0.00
# [2,] 0.00 0.01 0.00
# [3,] 0.00 0.00 0.01
So I can understand how A gets its dimensions; however, what is not clear to my is how the values in A are set to 0.01. I am trying to figure out how I can allow a user calling the rbprobitGibbs function to set the precision via A to whatever they like. I can see where A is output, but how are its values based on some input? Does anyone have any suggestions? TIA.
UPDATE:
Here is the output produced, but as far as I can determine it is identical whether I use prior = list(rep(0,3), .2*diag(3)) or not:
> out
$betadraw
[,1] [,2] [,3]
[1,] 0.3565099 0.6369436 -0.9859025
[2,] 0.4705437 0.7211755 -1.1955608
[3,] 0.1478930 0.6538157 -0.6989660
[4,] 0.4118663 0.7910846 -1.3919411
[5,] 0.0385419 0.9421720 -0.7359932
[6,] 0.1091359 0.7991905 -0.7731041
[7,] 0.4072556 0.5183280 -0.7993501
[8,] 0.3869478 0.8116237 -1.2831395
[9,] 0.8893555 0.5448905 -1.8526630
[10,] 0.3165972 0.6484716 -0.9857531
attr(,"class")
[1] "bayesm.mat" "mcmc"
attr(,"mcpar")
[1] 1 10 1
It gets this factor by a scaling constant on the prior precision matrix. In the source, you will note that if you do not supply a prior precision then it will generate a square k matrix and multiply it by .1. Nothing fancy here. These scaling parameters for all of the various functions in bayesm can be found in the ./bayesm/R/bayesmConstants.R file.
if (is.null(Prior$A)) {
A = BayesmConstant.A * diag(nvar)
}
Should you like to you could supply your own constant, say .2, you could do so as follows, prior = list(rep(0,k), .2*diag(k)), or even introduce some relational information into the prior.
Very late to the party, but I ran across this same issue and just figured it out. In order to change the A matrix and prior matrix you have to name them as well since all of your other input variables are named.
For example your code should be,
rbprobitGibbs(Data=Data1, Prior=list(betabar=betabar1, A=A1), Mcmc=Mcmc1)
If you do that, you are able to set your own values for betabar and A.
I write a function to calculate critical depth of water in a circular channel
while the flow (Q) and diameter (d) are given:
D_Critic<- function (Q,Dia) {
g=9.81
Diff=1
Phi=0.01
while(Diff>=0.001) {
A=16*Q*sqrt((2/g)*sin(Phi/2))
B=Dia^5/2*(Phi-sin(Phi))^3/2
Diff=A-B
Phi=Phi+0.001
Yc=Dia/2*(1-cos(Phi/2))
}
return(Yc)
}
now I want to use within function to bind Yc with dataframe DQ, but it returns only first calculated Yc and several repeated warnings:
Q<-c(2.5975,2.5900,2.4183,2.3077)
D<-c(1,1,1,1)
DQ<-data.frame(Q,D)
> D_Q<-within(DQ,Yc<-D_Critic( Q/2, D))
There were 50 or more warnings (use warnings() to see the first 50)
> D_Q
Q D Yc
1 2.5975 1 0.52609
2 2.5900 1 0.52609
3 2.4183 1 0.52609
4 2.3077 1 0.52609
> warnings()
Warning messages:
1: In while (Diff >= 0.001) { ... :
the condition has length > 1 and only the first element will be used
The while statement only takes one boolean value, e.g. Diff >= 0.001 where Diff must be a single number. In the first time you go through the loop, this is the case, as Diff equals 1. However, in the second instance, Diff becomes equal to A-B, where A and B are both vectors of length 4.
So, when your code reaches the second iteration, while generates a warning, as it is not sure how to deal with a vector of booleans. The choice it makes is to simply use the first element in the boolean vector, discarding the rest.
You need to consider what Diff actually is. Probably a single number, so Diff sum(A-B) or sum((A-B)^2). This would result in a single Diff value, and get rid of your errors. What Diff should exactly depends on the theory you are working on. Your text book should list this.
It was resolved with a trick:
Yc<-matrix(NA,length(DQ$Q),1)
for (i in 1:length(DQ$Q)) {
Yc[i,1]<- D_Critic(DQ$Q[i]/2,DQ$D[i])
}
> Yc
[,1]
[1,] 0.5260900
[2,] 0.5255907
[3,] 0.5163489
[4,] 0.5098512
DQ<-cbind(DQ,Yc)
> DQ
Q D Yc
1 2.5975 1 0.5260900
2 2.5900 1 0.5255907
3 2.4183 1 0.5163489
4 2.3077 1 0.5098512
I'm trying to implement feature hashing in R to help me with a text classification problem, but i'm not sure if i'm doing it the way it should be. Part of my code is based on this post: Hashing function for mapping integers to a given range?.
My code:
random.data = function(n = 200, wlen = 40, ncol = 10){
random.word = function(n){
paste0(sample(c(letters, 0:9), n, TRUE), collapse = '')
}
matrix(replicate(n, random.word(wlen)), ncol = ncol)
}
feature_hash = function(doc, N){
doc = as.matrix(doc)
library(digest)
idx = matrix(strtoi(substr(sapply(doc, digest), 28, 32), 16L) %% (N + 1), ncol = ncol(doc))
sapply(1:N, function(r)apply(idx, 1, function(v)sum(v == r)))
}
set.seed(1)
doc = random.data(50, 16, 5)
feature_hash(doc, 3)
[,1] [,2] [,3]
[1,] 2 0 1
[2,] 2 1 1
[3,] 2 0 1
[4,] 0 2 1
[5,] 1 1 1
[6,] 1 0 1
[7,] 1 2 0
[8,] 2 0 0
[9,] 3 1 0
[10,] 2 1 0
So, i'm basically converting the strings to integers using the last 5 hex digits of the md5 hash returned by digest. Questions:
1 - Is there any package that can do this for me? I haven't found any.
2 - Is it a good idea do use digest as hash function? If not, what can i do?
PS: I should test if it works before posting, but my files are quite big and take a lot of processing time, so i think it's more clever to someone point me in the right direction, because i'm sure i'm doing it wrong!
Thanks for nay help on this!
I don't know any existed CRAN package for this.
However, I wrote a package for myself to do feature hashing. The source code is here: https://github.com/wush978/FeatureHashing, but the API is different.
In my case, I use it to convert a data.frame to CSRMatrix, a customized sparse matrix in the package. I also implemented a helper function to convert the CSRMatrix to Matrix::dgCMatrix. For text classification, I guess the sparse matrix will be more suitable.
If you want to try it, please check the test script here: https://github.com/wush978/FeatureHashing/blob/master/tests/test-conver-to-dgCMatrix.R
Note that I only used it in Ubuntu, so I don't know if it works for windows or macs or not. Please feel free to ask me any question of the package on https://github.com/wush978/FeatureHashing/issues.