How can I calculate the OLS coefficient in R without the function lm.
Formula: ß = (X'X)^-1*X'y
X <- cbind(runif(1000000), rnorm(1000000), rchisq(1000000,50))
y <- 100 * X[,1] + 200 * X[,2] + rnorm(nrow(X), 0, 10)
Would be really grateful for help since I have no idea how I can do this
It's basic linear algebra for OLS. You might want to have a look at https://en.wikipedia.org/wiki/Linear_regression
set.seed(123)
X <- cbind(runif(1000000), rnorm(1000000), rchisq(1000000,50))
y <- 100 * X[,1] + 200 * X[,2] + rnorm(nrow(X), 0, 10)
# (X'X)^-1*X'y
# basic matrix algebra
solve(t(X) %*% X) %*% (t(X) %*% y)
# crossprod for numeric stability
crossprod(solve(crossprod(X)), crossprod(X,y))
# same in lm()
lm(y~0+X)
If your linear model has an intercept
x <- cbind(1, X)
# (X'X)^-1*X'y
solve(t(x) %*% x) %*% (t(x) %*% y)
crossprod(solve(crossprod(x)), crossprod(x,y))
lm(y~X)
Here is my version, including gradient decent. Kudos also to this post.
x0 <- c(1,1,1,1,1) # Intercept
x1 <- c(1,2,3,4,5)
x2 <- c(8,4,3,1,8)
x <- as.matrix(cbind(x0,x1,x2))
y <- as.matrix(c(3,7,5,11,14))
# (X'X)^-1 X'y
beta1 = solve(t(x)%*%x) %*% t(x)%*%y
# R's regression command
beta2 = summary(lm(y ~ x[, 2:3]))
# Gradient decent
m <- nrow(y)
grad <- function(x, y, theta) {
gradient <- (1/m)* (t(x) %*% ((x %*% t(theta)) - y))
return(t(gradient))
}
# Define gradient descent update algorithm
grad.descent <- function(x, maxit){
theta <- matrix(c(0, 0, 0), nrow=1) # Initialize the parameters
alpha = .05 # set learning rate
for (i in 1:maxit) {
theta <- theta - alpha * grad(x, y, theta)
}
return(theta)
}
# Results without feature scaling
print(grad.descent(x,2000))
beta1
beta2
Related
I want to write an R-code to compute the MLEs of (µ; σ2; λ) with Newton-Raphson algorithm and try initial values λ = 0.5、....
I'm trying to run the following code in R, but I'm getting an error
I'm not sure what part of the formula is incorrect. Any help would be greatly appreciated.
obs.loglik <- function(y, theta) {
mu <- theta[1]
sigma2 <- theta[2]
lambda <- theta[3]
n <- length(y)
-0.5*n*log(2*pi) - 0.5*n*log(sigma2) - sum((y^lambda - 1)/lambda - mu)^2/(2*sigma2) +
(lambda - 1)*(sum(log(y)))
}
library(maxLik)
mle <- maxLik(obs.loglik, start=c(mean(y), var(y), 0.5))
# Error in fnOrig(theta, ...) :
# argument "theta" is missing, with no default```
If theta is fixed, we could use it as a global variable without directly providing as input.
obs.loglik <- function(y) {
mu <- theta[1]
sigma2 <- theta[2]
lambda <- theta[3]
n <- length(y)
-0.5*n*log(2*pi) - 0.5*n*log(sigma2) - sum((y^lambda - 1)/lambda - mu)^2/(2*sigma2) +
(lambda - 1)*(sum(log(y)))
}
library(maxLik)
y <- 1:10 # example y
theta <- c(mean(y), var(y), 0.5) # Define theta outside
mle <- maxLik(obs.loglik, start=y)
If theta depends on y, we could do this instead.
obs.loglik <- function(y) {
mu <- mean(y) # depends on input
sigma2 <- var(y) # depends on input
lambda <- theta[3]
n <- length(y)
-0.5*n*log(2*pi) - 0.5*n*log(sigma2) - sum((y^lambda - 1)/lambda - mu)^2/(2*sigma2) +
(lambda - 1)*(sum(log(y)))
}
I am trying to calculate the standardized Pearson Residuals by hand in R. However, I am struggling when it comes to calculating the hat matrix.
I have built my own logistic regression and I am trying to calculate the standardized Pearson residuals in the logReg function.
logRegEst <- function(x, y, threshold = 1e-10, maxIter = 100)
{
calcPi <- function(x, beta)
{
beta <- as.vector(beta)
return(exp(x %*% beta) / (1 + exp(x %*% beta)))
}
beta <- rep(0, ncol(x)) # initial guess for beta
diff <- 1000
# initial value bigger than threshold so that we can enter our while loop
iterCount = 0
# counter for the iterations to ensure we're not stuck in an infinite loop
while(diff > threshold) # tests for convergence
{
pi <- as.vector(calcPi(x, beta))
# calculate pi by using the current estimate of beta
W <- diag(pi * (1 - pi))
# calculate matrix of weights W as defined int he fisher scooring algorithem
beta_change <- solve(t(x) %*% W %*% x) %*% t(x) %*% (y - pi)
# calculate the change in beta
beta <- beta + beta_change # new beta
diff <- sum(beta_change^2)
# calculate how much we changed beta by in this iteration
# if this is less than threshold, we'll break the while loop
iterCount <- iterCount + 1
# see if we've hit the maximum number of iterations
if(iterCount > maxIter){
stop("This isn't converging.")
}
# stop if we have hit the maximum number of iterations
}
n <- length(y)
df <- length(y) - ncol(x)
# calculating the degrees of freedom by taking the length of y minus
# the number of x columns
vcov <- solve(t(x) %*% W %*% x)
logLik <- sum(y * log(pi / (1 - pi)) + log(1 - pi))
deviance <- -2 * logLik
AIC <- -2 * logLik + 2 * ncol(x)
rank <- ncol(x)
list(coefficients = beta, vcov = vcov, df = df, deviance = deviance,
AIC = AIC, iter = iterCount - 1, x = x, y = y, n = n, rank = rank)
# returning results
}
logReg <- function(formula, data)
{
if (sum(is.na(data)) > 0) {
print("missing values in data")
} else {
mf <- model.frame(formula = formula, data = data)
# model.frame() returns us a data.frame with the variables needed to use the
# formula.
x <- model.matrix(attr(mf, "terms"), data = mf)
# model.matrix() creates a design matrix. That means that for example the
#"Sex"-variable is given as a dummy variable with ones and zeros.
y <- as.numeric(model.response(mf)) - 1
# model.response gives us the response variable.
est <- logRegEst(x, y)
# Now we have the starting position to apply our function from above.
est$formula <- formula
est$call <- match.call()
# We add the formular and the call to the list.
nullModel <- logRegEst(x = as.matrix(rep(1, length(y))), y)
est$nullDeviance <- nullModel$deviance
est$nullDf <- nullModel$df
mu <- exp(as.vector(est$x %*% est$coefficients)) /
(1 + exp(as.vector(est$x %*% est$coefficients)))
# computing the fitted values
est$residuals <- (est$y - mu) / sqrt(mu * (1 - mu))
est$mu <- mu
est$x <- x
est$y <- y
est$data <- data
hat <- (t(mu))^(1/2)%*%x%*%(t(x)%*%mu%*%x)^(-1)%*%t(x)%*%mu^(1/2)
est$stdresiduals <- est$residuals/(sqrt(1-hat))
class(est) <- "logReg"
# defining the class
est
}
}
I am struggling when it comes to calculating 𝐻=𝑉̂1/2𝑋(𝑋𝑇𝑉̂𝑋)−1𝑋𝑇𝑉̂1/2. This is called hat in my code.
If I try to calculate the hat matrix (hat) I get the error that I cannot multiply the vector mu and the matrix x in this case: t(x)%*%mu%*%x.
I can see that the rank of the matrices are not identical and therefor I can't multiply them.
Can Anyone see where my mistake is? Help is very appreciated. Thanks!
I calculated the integral of the product of a Gaussian density and some function.
First, I did it with the function int2() (rmutil package).
And then, I did it with Gauss-Hermite points.
The two results I have obtained are different.
Should I consider that the Gauss-Hermite method is the good one and the numerical integration is an approximation ?
I provide below an example :
1. rmutil::int2()
library(rmutil)
Sig <- matrix (c(0.2^2, 0, 0, 0.8^2), ncol=2)
Mu<- c(2, 0)
to.integrate <- function(B0, B1) {
first.int= 1/0.8 * (1.2 * exp(B0 + B1 * 0.5))^(-1/0.8) * gamma(1/0.8)
B=matrix(c(B0, B1), ncol=1)
multi.norm=1 / (2 * pi * det(Sig)^(1/2)) *
exp (- 0.5 * t( B - Mu ) %*% solve(Sig) %*%( B - Mu ) )
return (first.int %*% multi.norm)
}
result_int2 <- int2(to.integrate, a=c(-Inf, -Inf), b=c(Inf, Inf),
eps=1.0e-6, max=16, d=5)
2. Compute multivariate Gaussian quadrature points:
library(statmod)
mgauss.hermite <- function(n, mu, sigma) {
dm <- length(mu)
gh <- gauss.quad(n, 'hermite')
gh <- cbind(gh$nodes, gh$weights)
idx <- as.matrix(expand.grid(rep(list(1:n), dm)))
pts <- matrix(gh[idx, 1], nrow(idx), dm)
wts <- apply(matrix(gh[idx, 2], nrow(idx), dm), 1, prod)
eig <- eigen(sigma)
rot <- eig$vectors %*% diag(sqrt(eig$values))
pts <- t(rot %*% t(pts) + mu)
return(list(points=pts, weights=wts))
}
nod_wei <- mgauss.hermite(10, mu=Mu, sigma=Sig)
gfun <- function(B0, B1) {
first.int <- 1/0.8 *(1.2 * exp(B0 + B1 * 0.5))^(-1/0.8)* gamma(1/0.8)
return(first.int)
}
result_GH <- sum(gfun(nod_wei$points[, 1], nod_wei$points[, 2]) * nod_wei$weights)/pi
result_int2
result_GH
The mistake came from the way the points were calculated in the mgauss.hermite function.
I changed the decomposition of the Sigma matrix for a Cholesky decomposition with a multiplication by square root of 2.
And the results of the two methods became very similar.
Below is the correction of the mgauss.hermite function
mgauss.hermite <- function(n, mu, sigma) {
dm <- length(mu)
gh <- gauss.quad(n, 'hermite')
gh <- cbind(gh$nodes, gh$weights)
idx <- as.matrix(expand.grid(rep(list(1:n),dm)))
pts <- matrix(gh[idx,1],nrow(idx),dm)
wts <- apply(matrix(gh[idx,2],nrow(idx),dm), 1, prod)
rot <- 2.0**0.5*t(chol(sigma))
pts <- t(rot %*% t(pts) + mu)
return(list(points=pts, weights=wts))
}
I'm having trouble optimizing a multivariate normal log-likelihood in R. If anyone has a good solution for that, please let me know. Specifically, I cannot seem to keep the variance-covariance matrix positive-definite and the parameters in a reasonable range.
Let me introduce the problem more completely. I am essentially trying to simultaneously solve these two regression equations using MLE:
$$
y_1 = \beta_1 + \beta_2 x_1 + \beta_3 x_2 \\
y_2 = \beta_4 + \beta_3 x_1 + \beta_5 x_2
$$
The fact that $\beta_3$ is in both equations is not a mistake. I try to solve this using MLE by maximizing the likelihood of the multivariate normal distribution for $Y = (y_1, y_2)^\top$ where the mean is parameterized as above in the regression equations.
I've attached the log-likelihood function as I believe it should be, where I constrain the variance covariance matrix to be positive-definite by recreating it from necessarily positive eigenvalues and a cholesky decomposition.
mvrestricted_ll <- function(par, Y, X) {
# Indices
n <- nrow(X)
nbetas <- (2 + 3 * (ncol(Y) - 1))
# Extract parameters
beta <- par[1:nbetas]
eigvals <- exp(par[(nbetas + 1):(nbetas + ncol(Y))]) # constrain to be positive
chole <- par[(nbetas + ncol(Y) + 1):(nbetas + ncol(Y) + ncol(Y)*(ncol(Y)+1)/2)]
# Build Sigma from positive eigenvalues and cholesky (should be pos def)
L <- diag(ncol(Y))
L[lower.tri(L, diag=T)] <- chole
Sigma <- diag(eigvals) + tcrossprod(L)
# Linear predictor
# Hard coded for 2x2 example for now
mu <- cbind(beta[1] + beta[2]*X[,1] + beta[3]*X[,2],
beta[4] + beta[3]*X[,1] + beta[5]*X[,2])
yminmu <- Y - mu
nlogs <- n * log(det(Sigma))
invSigma <- solve(Sigma)
meat <- yminmu %*% tcrossprod(invSigma, yminmu)
return(- nlogs - sum(diag(meat)))
}
# Create fake data
n <- 1000
p <- 2
set.seed(20160201)
X <- matrix(rnorm(n*p), nrow = n)
set.seed(20160201)
Y <- matrix(rnorm(n*p), nrow = n)
# Initialize parameters
initpars <- c(rep(0, (2 + 3 * (ncol(Y) - 1)) + ncol(Y) + ncol(Y)*(ncol(Y)+1)/2))
# Optimize fails with BFGS
optim(par = initpars, fn = mvrestricted_ll, X=X, Y=Y, method = "BFGS")
# Optim does not converge with Nelder-mead, if you up the maxits it also fails
optim(par = initpars, fn = mvrestricted_ll, X=X, Y=Y)
Any help would be greatly appreciated.
EDIT: I should note that just letting Sigma be a vector in the parameters and then returning a very large value whenever it is not positive definite does not work either.
I have no idea if the code/answer is correct, but
invSigma <- try(solve(Sigma))
if (inherits(invSigma, "try-error")) return(NA)
and running
optim(par = initpars, fn = mvrestricted_ll, X=X, Y=Y,
control = list(maxit = 1e5))
gets me a little farther to a convergence code of 10 (degenerate Nelder-Mead simplex).
$par
[1] 1.361612e+01 4.674349e+01 -3.050170e+01 3.305013e+01 6.731194e+01
[6] -3.117192e+01 -5.408598e+00 -6.326897e-07 -1.987449e+01 -1.795924e+01
$value
[1] -1.529013e+19
$counts
function gradient
1219 NA
$convergence
[1] 10
I suspect that a real solution will involve looking more carefully at the code to see if it's really doing what you think it's doing (sorry); understanding why solve() errors occur might be a good first step. You can work on troubleshooting this by putting a cat(par, "\n") as the first line of the function and running it without the try/NA-return code. That will allow you to isolate an example data set that throws the error — then you can work your way through your code a line at a time (with debug() or by hand) to see what's happening.
You can consider using the following approach :
library(DEoptim)
fn <- function(par, mat_X, mat_Y)
{
X <- mat_X
Y <- mat_Y
n <- nrow(X)
nbetas <- (2 + 3 * (ncol(Y) - 1))
beta <- par[1 : nbetas]
eigvals <- exp(par[(nbetas + 1) : (nbetas + ncol(Y))])
chole <- par[(nbetas + ncol(Y) + 1) : (nbetas + ncol(Y) + ncol(Y) * (ncol(Y) + 1) / 2)]
L <- diag(ncol(Y))
L[lower.tri(L, diag = TRUE)] <- chole
Sigma <- tryCatch(diag(eigvals) + tcrossprod(L), error = function(e) NA)
if(is.null(dim(Sigma)))
{
return(10 ^ 30)
}else
{
mu <- cbind(beta[1] + beta[2] * X[,1] + beta[3] * X[,2],
beta[4] + beta[3] * X[,1] + beta[5] * X[,2])
yminmu <- Y - mu
nlogs <- n * log(det(Sigma))
invSigma <- tryCatch(solve(Sigma), error = function(e) NA)
if(is.null(dim(invSigma)))
{
return(10 ^ 30)
}else
{
meat <- yminmu %*% tcrossprod(invSigma, yminmu)
log_Lik <- - nlogs - sum(diag(meat))
if(is.na(log_Lik) | is.nan(log_Lik) | is.infinite(log_Lik))
{
return(10 ^ 30)
}else
{
return(-log_Lik)
}
}
}
}
n <- 1000
p <- 2
set.seed(20160201)
mat_X <- matrix(rnorm(n * p), nrow = n)
set.seed(2436537)
mat_Y <- matrix(rnorm(n * p), nrow = n)
lower <- rep(-10, 10)
upper <- rep(10, 10)
DEoptim(fn = fn, lower = lower, upper = upper,
control = list(itermax = 10000, parallelType = 1), mat_X = mat_X, mat_Y = mat_Y)
I am not able to apply ucminf function to minimise my cost function in R.
Here is my cost function:
costfunction <- function(X,y,theta){
m <- length(y);
J = 1/m * ((-t(y)%*%log(sigmoid(as.matrix(X)%*%as.matrix(theta)))) - ((1-t(y))%*%log(1-sigmoid(as.matrix(X)%*%as.matrix(theta)))))
}
Here is my sigmoid function:
sigmoid <- function(t){
g = 1./(1+exp(-t))
}
Here is my gradient function:
gradfunction <- function(X,y,theta){
grad = 1/ m * t(X) %*% (sigmoid(as.matrix(X) %*% as.matrix(theta) - y));
}
I am trying to do the following:
library("ucminf")
data <- read.csv("ex2data1.txt",header=FALSE)
X <<- data[,c(1,2)]
y <<- data[,3]
qplot(X[,1],X[,2],colour=factor(y))
m <- dim(X)[1]
n <- dim(X)[2]
X <- cbind(1,X)
initial_theta <<- matrix(0,nrow=n+1,ncol=1)
cost <- costfunction(X,y,initial_theta)
grad <- gradfunction(X,y,initial_theta)
This is where I want to call ucminf to find the minimum cost and values of theta. I am not sure how to do this.
Looks like you are trying to do the week2 problem of the machine learning course of Coursera.
No need to use ucminf packages here, you can simply use the R function optim it works
We will define the sigmoid and cost function first.
sigmoid <- function(z)
1 / (1 + exp(-z))
costFunction <- function(theta, X, y) {
m <- length(y)
J <- -(1 / m) * crossprod(c(y, 1 - y),
c(log(sigmoid(X %*% theta)), log(1 - sigmoid(X %*% theta))))
grad <- (1 / m) * crossprod(X, sigmoid(X %*% theta) - y)
list(J = J, grad = grad)
}
Let's load the data now, to make this code it reproductible, I put the data in my dropbox.
download.file("https://dl.dropboxusercontent.com/u/8750577/ex2data1.txt",
method = "curl", destfile = "/tmp/ex2data1.txt")
data <- matrix(scan('/tmp/ex2data1.txt', what = double(), sep = ","),
ncol = 3, byrow = TRUE)
X <- data[, 1:2]
y <- data[, 3, drop = FALSE]
m <- nrow(X)
n <- ncol(X)
X <- cbind(1, X)
initial_theta = matrix(0, nrow = n + 1)
We can then compute the result of the cost function at the initial theta like this
cost <- costFunction(initial_theta, X, y)
(grad <- cost$grad)
## [,1]
## [1,] -0.100
## [2,] -12.009
## [3,] -11.263
(cost <- cost$J)
## [,1]
## [1,] 0.69315
Finally we can use optim to ge the optimal theta
res <- optim(par = initial_theta,
fn = function(t) costFunction(t, X, y)$J,
gr = function(t) costFunction(t, X, y)$grad,
method = "BFGS", control = list(maxit = 400))
(theta <- res$par)
## [,1]
## [1,] -25.08949
## [2,] 0.20566
## [3,] 0.20089
(cost <- res$value)
## [1] 0.2035
If you have some problem with the function download.file, the data can be downloaded
here
As you did not provide a reproducible example it is hard to exactly give you the code you need, but the general idea is to hand the functions over to ucminf:
ucminf(start, costfunction, gradfunction, y = y, theta = initial_theta)
Note that start needs to be a vector of initial starting values which when handed over as X to the two functions need to produce a result. Usually you use random starting value (e.g., runif).