I have a data set with response variable ADA, and independent variables LEV, ROA, and ROAL. The data is called dt. I used the following code to get coefficients for latent classes.
m1 <- stepFlexmix(ADA ~ LEV+ROA+ROAL,data=dt,control= list(verbose=0),
k=1:5,nrep= 10);
m1 <- getModel(m1, "BIC");
All was fine until I read the following from http://rss.acs.unt.edu/Rdoc/library/flexmix/html/flexmix.html
model Object of FLXM of list of FLXM objects. Default is the object returned by calling FLXMRglm().
Which I think says that default model call is generalized linear model, while I am interested in linear model. How can I use linear model rather than GLM? I searched for it for quite a while, bit could't get it except this example from
http://www.inside-r.org/packages/cran/flexmix/docs/flexmix, which I couldn't make sense of:
data("NPreg", package = "flexmix")
## mixture of two linear regression models. Note that control parameters
## can be specified as named list and abbreviated if unique.
ex1 <- flexmix(yn~x+I(x^2), data=NPreg, k=2,
control=list(verb=5, iter=100))
ex1
summary(ex1)
plot(ex1)
## now we fit a model with one Gaussian response and one Poisson
## response. Note that the formulas inside the call to FLXMRglm are
## relative to the overall model formula.
ex2 <- flexmix(yn~x, data=NPreg, k=2,
model=list(FLXMRglm(yn~.+I(x^2)),
FLXMRglm(yp~., family="poisson")))
plot(ex2)
Someone please let me know how to use linear regression instead of GLM. Or am I already using LM and just got confused because of the "default model line"? Please explain. Thanks.
I did a numerical analysis to understand if
m1 <- stepFlexmix(ADA ~ LEV+ROA+ROAL,data=dt,control= list(verbose=0)
does produce results from linear regression. To do the experiment, I ran the following code and found that yes the estimated parameters are indeed from linear regression. Experiment helped me to allay my reservations.
x1 <- c(1:200);
x2 <- x1*x1;
x3 <- x1*x2;
e1 <- rnorm(200,0,1);
e2 <- rnorm(200,0,1);
y1 <- 5+12*x1+20*x2+30*x3+e1;
y2 <- 18+5*x1+10*x2+15*x3+e2;
y <- c(y1,y2)
x11 <- c(x1,x1)
x22 <- c(x2,x2)
x33 <- c(x3,x3)
d <- data.frame(y,x11,x22,x33)
m <- stepFlexmix(y ~ x11+x22+x33, data =d, control = list(verbose=0), k=1:5, nrep = 10);
m <- getModel(m, "BIC");
parameters(m);
plotEll(m, data = d)
m.refit <- refit(m);
summary(m.refit)
Related
I am new to modeling in R, so I'm stumbling a bit...
I have a model in Eviews, which I have to translate to R and make further upgrades.
The model is multiple OLS with AR(1) of residuals.
I implemented it like this
model1 <- lm(y ~ x1 + x2 + x3, data)
data$e <- dplyr:: lag(residuals(model1), 1)
model2 <- lm(y ~ x1 + x2 + x3 + e, data)
My issue is the same as it is in this thread and I expected it: while parameter estimations are similar, they are different enought that I cannot use it.
I am planing of using ARIMA from stats package, but the problem is implementation. How to make AR(1) on residuals, and make other variables as they are?
Provided I understood you correctly, you can supply external regressors to your arima model through the xreg argument.
You don't provide sample data so I don't have anything to play with, but your model should translate to something like
model <- arima(data$y, xreg = as.matrix(data[, c("x1", "x2", "x3")]), order = c(1, 0, 0))
Explanation: The first argument data$y contains your time series data. xreg contains your external regressors as a matrix, with every column containing as many observations for that regressor as you have time points. order = c(1, 0, 0) defines an AR(1) model.
I have performed linear regression (lm) on two modified p-value types: q-value and Benjamini-Hochberg. Results gives two astronomical outliers, however, after removal of those, new outliers are always present. Could someone please replicate the code and see if issue prevails? What could be the possible source of an issue?
Here is the full code for easy copy/paste:
library(qvalue)
p = 50
m = 10
pval = c(rbeta(m,1,100), runif(p-m,0,1))
BHpval <- p.adjust(pval,method="BH")
qval_ <- qvalue(pval)
print(qval_$pi0)
fit2 <- lm(qval_$qvalues ~ BHpval)
plot(fit2)
Let me state my confusion with the help of an example,
#making datasets
x1<-iris[,1]
x2<-iris[,2]
x3<-iris[,3]
x4<-iris[,4]
dat<-data.frame(x1,x2,x3)
dat2<-dat[1:120,]
dat3<-dat[121:150,]
#Using a linear model to fit x4 using x1, x2 and x3 where training set is first 120 obs.
model<-lm(x4[1:120]~x1[1:120]+x2[1:120]+x3[1:120])
#Usig the coefficients' value from summary(model), prediction is done for next 30 obs.
-.17947-.18538*x1[121:150]+.18243*x2[121:150]+.49998*x3[121:150]
#Same prediction is done using the function "predict"
predict(model,dat3)
My confusion is: the two outcomes of predicting the last 30 values differ, may be to a little extent, but they do differ. Whys is it so? should not they be exactly same?
The difference is really small, and I think is just due to the accuracy of the coefficients you are using (e.g. the real value of the intercept is -0.17947075338464965610... not simply -.17947).
In fact, if you take the coefficients value and apply the formula, the result is equal to predict:
intercept <- model$coefficients[1]
x1Coeff <- model$coefficients[2]
x2Coeff <- model$coefficients[3]
x3Coeff <- model$coefficients[4]
intercept + x1Coeff*x1[121:150] + x2Coeff*x2[121:150] + x3Coeff*x3[121:150]
You can clean your code a bit. To create your training and test datasets you can use the following code:
# create training and test datasets
train.df <- iris[1:120, 1:4]
test.df <- iris[-(1:120), 1:4]
# fit a linear model to predict Petal.Width using all predictors
fit <- lm(Petal.Width ~ ., data = train.df)
summary(fit)
# predict Petal.Width in test test using the linear model
predictions <- predict(fit, test.df)
# create a function mse() to calculate the Mean Squared Error
mse <- function(predictions, obs) {
sum((obs - predictions) ^ 2) / length(predictions)
}
# measure the quality of fit
mse(predictions, test.df$Petal.Width)
The reason why your predictions differ is because the function predict() is using all decimal points whereas on your "manual" calculations you are using only five decimal points. The summary() function doesn't display the complete value of your coefficients but approximate the to five decimal points to make the output more readable.
I can extract the p-values for my slope & intercept from an ols object this way:
library(rms)
m1 <- ols(wt ~ cyl, data= mtcars, x= TRUE, y= TRUE)
coef(summary.lm(m1))
But when I try the same thing with a robcov object, summary.lm gives me the p-values from the original model (m1), not the robcov model:
m2 <- robcov(m1)
m2
coef(summary.lm(m2))
I think this must be related to the Warning from the robcov help page,
Warnings
Adjusted ols fits do not have the corrected standard errors printed
with print.ols. Use sqrt(diag(adjfit$var)) to get this, where adjfit
is the result of robcov.
but I'm not sure how.
Is there a way to extract the p-values from a robcov object? (I'm really only interested in the one for the slope, if that makes a difference...)
Hacking through print.ols and prModFit, I came up with this.
errordf <- m2$df.residual
beta <- m2$coefficients
se <- sqrt(diag(m2$var))
Z <- beta/se
P <- 2 * (1 - pt(abs(Z), errordf))
Change m2 to another robcov model.
Try it for yourself by comparing the results of P to print(m2)
I have 4 dimensions of data. In R, I'm using plot3d with the 4th dimension being color. I'd like to now use SVM to find the best regression line to give me the best correlation. Basically, a best fit hyperplane dependent on the color dimension. How can I do this?
This is the basic idea (of course the specific formula will vary depending on your variable names and which is the dependent):
library(e1071)
data = data.frame(matrix(rnorm(100*4), nrow=100))
fit = svm(X1 ~ ., data=data)
Then you can use regular summary, plot, predict, etc. functions on the fit object. Note that with SVMs, the hyper-parameters usually need to be tuned for best results. you can do this with the tune wrapper. Also check out the caret package, which I think is great.
Take a look on svm function in the e1071 package.
You can also consider the kernelab, klaR or svmpath packages.
EDIT: #CodeGuy, John has provided you an example. I suppose your 4 dimensions are features that you use to classify your data, and that you have also another another variable that is the real class.
y <- gl(4, 5)
x1 <- c(0,1,2,3)[y]
x2 <- c(0,5,10,15)[y]
x3 <- c(1,3,5,7)[y]
x4 <- c(0,0,3,3)[y]
d <- data.frame(y,x1,x2,x3,x4)
library(e1071)
svm01 <- svm(y ~ x1 + x2 + x3 + x4, data=d)
ftable(predict(svm01), y) # Tells you how your svm performance