Lambda's from glmnet (R) used in online SGD - r

I'm using cv.glmnet from glmnet package (in R). In the outcome I get a vector of lambda's (regularization parameter). I would like to use it in the online SGD algorithm. Is there a way of doing so and how?
Any suggestion would be helpful.
I am wondering how can I compare results (in the terms of a model's coefficients and regularization output parameter) in generalized linear model with l1 regularization and binomial distribution (logistic link function) that was calculated once in offline using cv.glmnet function from R package that I think uses Raphson-Newton estimation algorithm WITH online evaluating model of the same type but where the estimates are re-calculated after every new observation using stochastic-gradient-descent algorithm ( classic, type I ).

Related

How to find the best fitted models using the forward ,backward and the stepwise selection in poisson regression using R programming?

I am using regsubsets method for linear regression and came across step() method for selecting columns for logistic regression methods.
I am not sure whether we can use regsubsets or steps for Poisson regression. It will be helpful if there is a method to find the best subsets for Poisson regression in R programming.
From here it looks like
step() (base R: works on glm objects -> includes Poisson regression models)
bestglm package
glmulti package
Possibly others.
Be prepared for the GLM (Poisson etc.) case to be much slower than the analogous problem for Gaussian responses (OLS/lm).

Can GLMNet perform Logistic regression?

I am using GLMNet to perform LASSO on Binary Logistic models with cv.GLMNet to test selection consistency and would like to compare its performance with plain GLM Logistic regression. For fairness' sake in comparing the outputs, I would like to use GLMNet to perform this regression, however, I am unable to find any way to do so, barring using GLMnet with alpha = 0 and Lambda = 0 (Ridge without a penalty factor). However, I am unsure about this method, as it seems slightly janky, GLMnet's manual discourages the inputting of single lambda values (for speed reasons) and it provides me no z-values to determine the confidence level of the coefficient. (Essentially, the ideal output would be something similar to just using r's GLM function)
I've read through the entire manual and cant find a method of doing this, is there a way to perform Logistic Regression with GLMNet, without the penalty factor in order to get an output similar to r's GLM?

Spark ML Pipeline Logistic Regression Produces Much Worse Predictions Than R GLM

I used ML PipeLine to run logistic regression models but for some reasons I got worst results than R. I have done some researches and the only post that I found that is related to this issue is this . It seems that Spark Logistic Regression returns models that minimize loss function while R glm function uses maximum likelihood. The Spark model only got 71.3% of the records right while R can predict 95.55% of the cases correctly. I was wondering if I did something wrong on the set up and if there's a way to improve the prediction. The below is my Spark code and R code-
Spark code
partial model_input
label,AGE,GENDER,Q1,Q2,Q3,Q4,Q5,DET_AGE_SQ
1.0,39,0,0,1,0,0,1,31.55709342560551
1.0,54,0,0,0,0,0,0,83.38062283737028
0.0,51,0,1,1,1,0,0,35.61591695501733
def trainModel(df: DataFrame): PipelineModel = {
val lr = new LogisticRegression().setMaxIter(100000).setTol(0.0000000000000001)
val pipeline = new Pipeline().setStages(Array(lr))
pipeline.fit(df)
}
val meta = NominalAttribute.defaultAttr.withName("label").withValues(Array("a", "b")).toMetadata
val assembler = new VectorAssembler().
setInputCols(Array("AGE","GENDER","DET_AGE_SQ",
"QA1","QA2","QA3","QA4","QA5")).
setOutputCol("features")
val model = trainModel(model_input)
val pred= model.transform(model_input)
pred.filter("label!=prediction").count
R code
lr <- model_input %>% glm(data=., formula=label~ AGE+GENDER+Q1+Q2+Q3+Q4+Q5+DET_AGE_SQ,
family=binomial)
pred <- data.frame(y=model_input$label,p=fitted(lr))
table(pred $y, pred $p>0.5)
Feel free to let me know if you need any other information. Thank you!
Edit 9/18/2015 I have tried increasing the maximum iteration and decreasing the tolerance dramatically. Unfortunately, it didn't improve the prediction. It seems the model converged to a local minimum instead of the global minimum.
It seems that Spark Logistic Regression returns models that minimize loss function while R glm function uses maximum likelihood.
Minimization of a loss function is pretty much a definition of the linear models and both glm and ml.classification.LogisticRegression are no different here. Fundamental difference between these two is the way how it is achieved.
All linear models from ML/MLlib are based on some variants of Gradient descent. Quality of the model generated using this approach vary on a case by case basis and depend on the Gradient Descent and regularization parameters.
R from the other hand computes an exact solution which, given its time complexity, is not well suited for large datasets.
As I've mentioned above quality of the model generated using GS depends on the input parameters so typical way to improve it is to perform hyperparameter optimization. Unfortunately ML version is rather limited here compared to MLlib but for starters you can increase a number of iterations.

Is it possible to customize a likelihood function for logit models using speedglm, biglm, and glm packages

I am trying to fit a customized logistic regression/survival analysis function using the optim/maxBFGS functions in R and literally defining the functions by hand.
I was always under the impression that for the packages speedglm, biglm, and glm, the likelihood functions for logit models or whatever distribution were hardlocked. However, I was wondering if I was mistaken or if it was possible to specify my own likelihood functions. The reason being that optim/maxBFGS is a LOT slower to run than speedglm.
The R glm function is set up only to work with likelihoods from the exponential family. The fitting algorithms won't work with any other kind of likelihood, and with any other you're not in fact fitting a glm but some other kind of model.
The glm functions fit using iterated reweighted least squares; the special form of the likelihood function for the exponential families makes Newton's method for solving the max likelihood equations identical to fitting ordinary least squares regression repeatedly until convergence is achieved.
This is a faster process than generic nonlinear optimization; so if the likelihoods you want to use have been customized so that they are no longer from an exponential family, you are no longer fitting a generalized linear model. This means that the IRWLS algorithm isn't applicable, and the fit will be slower, as you are finding.

Output posterior distribution from bayesian network in R (bnlearn)

I'm experimenting with Bayesian networks in R and have built some networks using the bnlearn package. I can use them to make predictions for new observations with predict(), however I would also like to have the posterior distribution over the possible classes. Is there a way of retrieving this information?
It seems like there is a prob-parameter that does this for the naive bayes implementation in the bnlearn package, but not for networks fitted with bn.fit.
Thankful for any help with this.
See the documentation of bnlearn.
predict function implements prob only for naive.bayes and TAN.
In short, because all other methods do not necessarily compute posterior probabilities.
[bnlearn] :: predict returns the predicted values for node given the data specified by data. Depending on the
value of method, the predicted values are computed as follows:
a)parents b)bayes-lw
When using bayes-lw , likelihood weighting simulations are performed for making predictions.
Hope this helps. :)

Resources