YC is a matrix with dimension= 39*1
XC is a matrix with dimension= 39*700
First I define a function in the following by calculating out the coefficient by Multi linear regression because i tried to get the predicted y of loocv. The error is Error in a + b1 %*% xp : non-conformable arrays.
mlr<- function(y,x,xp){
ybar<-mean(y)
xbar<-apply(x,1,mean)
xc<-scale(x,center = T, scale= FALSE)
b<-solve(t(xc)%*%xc)%*%t(xc)%*%y
b1<-t(b)
a<-ybar-b%*%xbar
xp<- matrix(nrow=2,ncol=1)
yp_LOOCV<-a+b1%*%xp
return(yp_LOOCV)
}
RMSECV<- sqrt(sum((total$YC - yp_LOOCV)^2)/39)
Second part is to plug the function into the LOOCV loop to calculate the RMSE of the data. The reason why i didnt applied a package and use commend lm() is that i will use the loop repeatedly afterwards by applying the pca and PLR.Could please see what else i should modify to correct the error?
YC<-data.frame(YC)
XPred<-cbind(total$X1590,total$X1724)
n <- 0
yp_LOOCV <- as.vector(0)
#Running the LOOCV iteratively ('for loop').
for (i in 1 : 39)
{n <- 1 + n
YC1 <-YC[-n,]
XC1<-XPred[-n,]
Yout<-YC[n]
Xout<-XPreduced[n]
yp_LOOCV<-mlr(YC1,XC1,Xout)
}
Related
I have a question on minimizing the sum of squared residuals to estimate "theta" in the below regression function. I intend not to use any built-in functions or packages in R, and write the iterative algorithm.
The regression function is: y_k=exp(-theta |x_k|)+e_k, for k=1,...,n
Here is my code, but it gives me the following error for some sets of x and y. Thanks in advance for your suggestions!
Error in if (abs(dif) < 10^(-5)) break :
missing value where TRUE/FALSE needed"
Code:
theta <- -sum(log(abs(y)))/sum(abs(x))
#Alg:
rep <- 1
while(rep<=1000){
Ratio <- sum((abs(x)*exp(-theta*abs(x)))*(y-exp(-theta*abs(x))))/
sum((abs(x)^2*exp(-theta*abs(x)))*(y-2*exp(-theta*abs(x))))
if(is.na(Ratio)){
thetanew <- theta
}
else{
thetanew <- theta+Ratio
}
dif <- thetanew-theta
theta <- thetanew
if(abs(dif)<10^(-5)) break
rep=rep+1
}
I am trying to create linear regression function in R for n parameter but I don't know how to proceed.I have created function for two variable.
]
new_lm <- function(y,x){
z=cbind(1,x)
k= solve(t(z)%*%z) %*% t(z) %*% y
return(k)
}
But in this case I am passing the values suppose I wanted to use it for n parameter that is same function can be used for n=1,2.. etc.
Something like this might work:
new_lm <- function(y,...){
x <- do.call(cbind, list(...))
z <- cbind(1,x)
k <- solve(t(z)%*%z) %*% t(z) %*% y
return(k)
}
By the way, computing linear regressions this way is terrible in practice (although for small problems it will work fine); you should use QR or singular value decomposition, or some other more sophisticated bit of linear algebra ...
I am using CVXR to solve a concave objective function. The decision variable (x) is one-dimensional and the objective function is the summation of 2 logarithmic terms in which the second term is exponential with different bases of “a and b” (e.g., a^x, b^x); “a and b” are constants.
My full objective function is:
(-x*sum(ln(y))) + ln((1-x)/((a^(1-x))-(b^(1-x))))
where y is a given 1-D vector of data.
When I add the second term having (a^x and b^x) to the objective function, I keep getting
Error in a^(1 - x): non-numeric argument to binary operator
Is there any atom function in CVXR that can be used to code constant^x?
Here is my code:
library(CVXR)
a <- 7
b <- 0.3
M=1000
x_i # is a given vector of 1-D data
x <- Variable(1)
nominator <- (1-x)
denominator <- (1/((a^(1-x))-(b^(1-x))))
obj <- (-xsum(log(x_i)) + Mlog(nominator/denominator)) # change M to the length of X_i later
constr <- list(x>0)
prob <- Problem(Maximize(obj), constr)
result <- solve(prob)
alpha_hat <- result$getValue(x)
Please tell me what I am doing wrong. I appreciate your help in advance.
do some math
2=e^log2
2^x=(e^log2)^x=e^(log2*x)
So, you can try
denominator <- 1/(exp(log(a)*(1-x)) - exp(log(b)*(1-x)))
I am trying to reproduce the SPSS output for significance a linear trend among means when equal variances are not assumed.
I have gratefully used code from http://www-personal.umich.edu/~gonzo/coursenotes/file3.pdf to create a function for calculating separate variances, which based on my searching I understand as the “equal variances not assumed” output in SPSS.
My problem/goal:
I am only assessing polynomial orthogonal trends (mostly linear). I want to adapt the code creating the function so that the contrast argument can take pre-made contrast matrices rather than manually specifying the coefficients each time (room for typos!).
… I have tried those exact commands but receive Error in contrast %*% means : non-conformable arguments . I have played around with the code but I can’t get it to work.
Code for creating the function from the notes:
sepvarcontrast <- function(dv, group, contrast) {
means <- c(by(dv, group, mean))
vars <- c(by(dv, group, var))
ns <- c(by(dv, group, length))
ihat <- contrast %*% means
t.denominator <- sqrt(contrast^2 %*% (vars/ns))
t.welch <- ihat/ t.denominator
num.contrast <- ifelse(is.null(dim(contrast)),1,dim(contrast)[1])
df.welch <- rep(0, num.contrast)
if (is.null(dim(contrast))) contrast <- t(as.matrix(contrast))
for (i in 1:num.contrast) {
num <- (contrast[i,]^2 %*% (vars))^2
den <- sum((contrast[i,]^2 * vars)^2 / (ns-1))
df.welch[i] <- num/den
}
p.welch <- 2*(1- pt(abs(t.welch), df.welch))
result <- list(ihat = ihat, se.ihat = t.denominator, t.welch = t.welch,
df.welch = df.welch, p.welch = p.welch)
return(result)
}
I would like to be able to use the function like this:
# Create a polynomial contrast matrix for 5 groups, then save
contr.mat5 <- contr.poly(5)
# Calculate separate variance
sepvarcontrast(dv, group, contrast = contr.mat5)
I have tried those exact commands to see if they would work but receive Error in contrast %*% means : non-conformable arguments.
All suggestions are appreciated! I am still learning how to create a reprex...
I am looking for a fast way to do nonnegative quantile and Huber regression in R (i.e. with the constraint that all coefficients are >0). I tried using the CVXR package for quantile & Huber regression and the quantreg package for quantile regression, but CVXR is very slow and quantreg seems buggy when I use nonnegativity constraints. Does anybody know of a good and fast solution in R, e.g. using the Rcplex package or R gurobi API, thereby using the faster CPLEX or gurobi optimizers?
Note that I need to run a problem size like below 80 000 times, whereby I only need to update the y vector in each iteration, but still use the same predictor matrix X. In that sense, I feel it's inefficient that in CVXR I now have to do obj <- sum(quant_loss(y - X %*% beta, tau=0.01)); prob <- Problem(Minimize(obj), constraints = list(beta >= 0)) within each iteration, when the problem is in fact staying the same and all I want to update is y. Any thoughts to do all this better/faster?
Minimal example:
## Generate problem data
n <- 7 # n predictor vars
m <- 518 # n cases
set.seed(1289)
beta_true <- 5 * matrix(stats::rnorm(n), nrow = n)+20
X <- matrix(stats::rnorm(m * n), nrow = m, ncol = n)
y_true <- X %*% beta_true
eps <- matrix(stats::rnorm(m), nrow = m)
y <- y_true + eps
Nonnegative quantile regression using CVXR :
## Solve nonnegative quantile regression problem using CVX
require(CVXR)
beta <- Variable(n)
quant_loss <- function(u, tau) { 0.5*abs(u) + (tau - 0.5)*u }
obj <- sum(quant_loss(y - X %*% beta, tau=0.01))
prob <- Problem(Minimize(obj), constraints = list(beta >= 0))
system.time(beta_cvx <- pmax(solve(prob, solver="SCS")$getValue(beta), 0)) # estimated coefficients, note that they ocasionally can go - though and I had to clip at 0
# 0.47s
cor(beta_true,beta_cvx) # correlation=0.99985, OK but very slow
Syntax for nonnegative Huber regression is the same but would use
M <- 1 ## Huber threshold
obj <- sum(CVXR::huber(y - X %*% beta, M))
Nonnegative quantile regression using quantreg package :
### Solve nonnegative quantile regression problem using quantreg package with method="fnc"
require(quantreg)
R <- rbind(diag(n),-diag(n))
r <- c(rep(0,n),-rep(1E10,n)) # specify bounds of coefficients, I want them to be nonnegative, and 1E10 should ideally be Inf
system.time(beta_rq <- coef(rq(y~0+X, R=R, r=r, tau=0.5, method="fnc"))) # estimated coefficients
# 0.12s
cor(beta_true,beta_rq) # correlation=-0.477, no good, and even worse with tau=0.01...
To speed up CVXR, you can get the problem data once in the beginning, then modify it within a loop and pass it directly to the solver's R interface. The code for this is
prob_data <- get_problem_data(prob, solver = "SCS")
Then, parse out the arguments and pass them to scs from the scs library. (See Solver.solve in solver.R). You'll have to dig into the details of the canonicalization, but I expect if you're just changing y at each iteration, it should be a straightforward modification.