Is there a way to rewrite a code in r to python? - jupyter-notebook

How to I rewrite this code library(ISLR)
data(Auto)
fit <- lm(mpg ~ horsepower, data = Auto)
summary(fit) with a python code
I tried typing the above code in Jupyter notebook but instead, I got error messages

If you do wish to switch to using Python, see this answer to 'What is the equivalent of R's lm function for fitting simple linear regressions in python?'. That also includes the equivalent to the line summary(fit).
Also related is here or here or here or R vs Python: Linear Regression.

Related

R, mitools::MIcombine, what is the reason for no p-values?

I am currently running a simple linear regression model with 5 multiply imputed datasets in R.
E.g. model <- with(imp, lm(outcome ~ exposure))
To pool the summary estimates I could use the command summary(mitools::MIcombine(model)) from the mitools package. However, this does not give results for p-values. I could also use the command summary(pool(model)) from the mice package and this does give results for p-values.
Because of this, I am wondering if there is a specific reason why MIcombine does not produce p-values?
After looking through the documentation, it doesn't seem like there is a particular reason that the mitools library doesn't provide p-values. Although, the package's focus is on imputation, not model results.
However, you don't need either of these packages to see your results–along with the per model p-values. I started writing this as a comment but decided to include the code. If you weren't aware...you can use base R's summary. I realize that the output of mice is comparative, as is mitools. I thought it was important enough to mention this, as well.
If the output of your call is model, then this will work.
library(tidyverse)
map(1:length(model), ~summary(model[.x]))

Is there a Julia alternative to R predict() method for the MixedModels package?

I have created a linear mixed model in Julia 1.0.4 using the MixedModels and StatsModels packages because lme4 in R does not seem to run on my dataset. Now that I have the model, however, I have not been able to find a way to apply it to test data so I can gauge the model's accuracy. Is there a Julia alternative to the R predict() method that allows new data? I have tried using the Julia predict() method but it only works to predict the same data I created the model with.
I created the model just like this example:
mm1 = fit(LinearMixedModel, #formula(Y ~ 1 + (1 | G)), dyestuff)
but there does not seem to be a method that would allow something like
predict(mm1, test_data)
to work in Julia. I have also tried using RCall to send the mm1 variable over to R and use the predict() method that way, but it does not seem like #rput can work with that kind of variable. Thanks!

Obtaining the Linear Regression Model at each Leaf for M5P model

I am trying to figure how to get the linear model at each leaf of a tree generated by M5P method in RWeka library in R as an output to text file so that I can write a separate look up calculator program (say in Excel for non-R Users).
I am using
library (RWeka)
model = M5P (response ~ predictorA+predictorB, data=train).
I can get the tree output as model$classifier in a matrix. This works great thanks to This post
If I give the command:
model
R prints the model$classifier (the tree structure), followed by the LM at each leaf, I want to extract the coefficients of LM at each leaf.
Using the following code: I am able to get the LM coefficients out of R.
library(rJava)
ModelTree=as.matrix(scan(text=.jcall(model$classifier, "S","toString") ,sep="\n", what="") )[-c(1:2, 6), ,drop=FALSE]

How to keep a simulation from crashing when one application of the lrm function in rms cannot be fit?

I am running a Monte Carlo simulation with 1000 iterations. Within each iteration, I am fitting a weighted logistic regression model using the lrm function from the Harrell's rms package. The model is fit using this code: lrm(y ~ x, weights=wt,x=T,y=T)
From the fitted model, I extract some information such as the regression coefficients and the estimated standard errors.
The simulations crashed with the error message:
Unable to fit model using "lrm.fit".
I would like to prevent the simulations from crashing, by only evaluating the function if it is safe to do so. In the very large majority of iterations, there is no problem. Somehow, within each iteration, I would like to tell R to only fit the function if it can be done safely.
Is there a way that this can be done?
Consider using try which will report an error but not exit the whole loop or function.
for (i in 1:10){
try(lrm(y ~ x, weights=wt,x=T,y=T) )
}
Where something relevant to lrm would be changing on each iteration (such as x for example).

Test for Multicollinearity in Panel Data R

I am running a panel data regression using the plm package in R and want to control for multicollinearity between the explanatory variables.
I know there is the vif() function in the car-package, however as far as I know, it cannot deal with panel data output.
The plm can do other diagnostics such as a unit root test but I found no method to calculate for multicollinearity.
Is there a way to calculate a similar test to vif, or can I just regard each variable as a time-series, leaving out the panel information and run tests using the car package?
I cannot disclose the data, but the problem should be relevant to all panel data models.
The dimension is roughly 1,000 observations, over 50 time-periods.
The code I use looks like this:
pdata <- plm.data(RegData, index=c("id","time"))
fixed <- plm(Y~X, data=pdata, model="within")
and then
vif(fixed)
returns an error.
Thank you in advance.
This question has been asked with reference to other statistical packages such as SAS https://communities.sas.com/thread/47675 and Stata http://www.stata.com/statalist/archive/2005-08/msg00018.html and the common answer has been to use pooled model to get VIF. The logic is that since multicollinearity is only about independent variable there is no need to control for individual effects using panel methods.
Here's some code extracted from another site:
mydata=read.csv("US Panel Data.csv")
attach(mydata) # not sure is that's really needed
Y=cbind(Return) # not sure what that is doing
pdata=plm.data(mydata, index=c("id","t"))
model=plm(Y ~ 1+ESG+Beta+Market.Cap+PTBV+Momentum+Dummy1+Dummy2+Dummy3+Dummy4+Dummy5+
Dummy6+Dummy7+Dummy8+Dummy9,
data=pdata,model="pooling")
vif(model)

Resources