Clustered standard errors in linear regression

Clustered standard errors in linear regression - r

Newbie question: I am currently writing my thesis about the impact of an analysis tool on the number of fans for a company. I created the following regression model: Fans ~ eventWeek * analysistool
The problem with this function is that it assumes that every line in the data is unrelated, which it is not. Therefore, my thesis coach advised me to use clustered standard errors within R and cluster it on Company Number. Does anyone know how to do this?

I believe this document answers your question in terms of GLMs.

Related

How to specify a vector full of means with degrees of freedom for a Lack of Fit F-Test in R

Currently I'm working through Applied Linear Models $5^{th}$ ed - by Kutner, et al. A question I'm working on is asking me to perform an F-Test for Lack of Fit on my linear model. The linear model is a simple linear model of one parameter nothing too troublesome.
To perform the test one has to assess the difference between the full model and the reduced model. At this current junction the authors have stated to take the full model as $\hat{\mu_{j}} = \bar{Y_{j}}$. Specifically the screenshot below says the following:
The reduced model would be the simple linear model:
I have no problem being able to do this manually within R, by computing the necessary values where need be as I've done for other questions. But I'm trying to improve my R skill set and this is where my problem lies.
I have done some reading to other answers related to this and model comparison can be done directly in the anova() function. But I'm having issues stating my full model correctly to be able to leverage the anova() function. I thought about computing a "vector of means" for the subgroups of data (which I display here just for completeness)
But I'm going to run into the problem of the anova() function most likely not being able to compute the degrees of freedom correctly. My data set is very small and this seems like the sort of situation that would show up all the time. With huge data sets I wouldn't see it being feasible to compute things manually so surely there has to be a way for me to phrase my Full Model properly to allow for the computation of means from the subgroups of replicates. But how do I do so? is the question of the day.

For completeness and posterity an answer was given on a sister site I asked this question on:
https://stats.stackexchange.com/questions/539958/how-to-specify-a-vector-full-of-means-with-degrees-of-freedom-for-a-lack-of-fit
the mods can delete the question if they deem fit and doesn't contribute to the community.

How to run Longitudinal Ordinal Logistic Regression in R

I'm working with a large data set with repeated patients over multiple months with ordered outcomes on a severity scale from 1 to 5. I was able to analyze the first set of patients using the polr function to run a basic ordinal logistic regression model, but now want to analyze association across all the time points using a longitudinal ordinal logistic model. I can't seem to find any clear documentation online or on this site so far explaining which package to use and how to use it. I am also an R novice so any simple explanations would be incredibly useful. Based on some initial searching it seems like the mixor function might be what I need though I am not sure how it works. I found it on this site
https://cran.r-project.org/web/packages/mixor/vignettes/mixor.pdf
Would appreciate a simple explanation of how to use this function if this is the right one, or would happily take any alternate suggestions with an explanation.
Thank you in advance for your help!

Machine Learning Suggestions [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I have data of a lot of students who got selected by some colleges based on their marks. Iam new to machine Learning. Can I have some suggestions how can I add Azure Machine Learning for predicting the colleges that they can get based on their marks

Try a multi-class logistic regression - also look at this https://gallery.cortanaanalytics.com/Experiment/da44bcd5dc2d4e059ebbaf94527d3d5b?fromlegacydomain=1

Apart from logistic regression, as #neerajkh suggested, I would try as well
One vs All classifiers. This method use to work very well in multiclass problems (I assume you have many inputs, which are the marks of the students) and many outputs (the different colleges).
To implement one vs all algorithm I would use Support Vector Machines (SVM). It is one of the most powerful algorithms (until deep learning came into the scene, but you don't need deep learning here)
If you could consider changing framework, I would suggest to use python libraries. In python it is very straightforward to compute very very fast the problem you are facing.

use randomforesttrees and feed this ML algorithm to OneVsRestClassifer which is a multi class classifier

Keeping in line with other posters' suggestions of using multi-class classification, you could use artificial neural networks (ANNs)/multilayer perceptron to do this. Each output node could be a college and, because you would be using a sigmoid transfer function (logistic) the output for each of the nodes could be directly viewed as the probability of that college accepting a particular student (when trying to make predictions).

Why don't you try softmax regression?
In extremely simple terms, Softmax takes an input and produces the probability distribution of the input belonging to each one of your classes. So in other words based on some input (grade in this case), your model can output the probability distribution that represents the "chance" a given sudent has to be accepted to each college.

I know this is an old thread but I will go ahead and add my 2 cents too.
I would recommend adding multi-class, multi-label classifier. This allows you to find more than one college for a student. Of course this is much easier to do with an ANN but is much harder to configure (say with the configuration of the network; number of nodes/hidden nodes or even the activation function for that matter).
The easiest method to do this as #Hoap Humanoid suggests is to use a Support Vector Classifier.
To do any of these method its a given that you have to havea well diverse data set. I cant say the number of data points you need that you have to experiment with but the accuracy of the model is dependent on number of data points and its diversity.

This is very subjective. Just applying any algorithm that classifies into categories won't be a good idea. Without performing Exploratory Data Analysis and checking following things you can't be sure of a doing predictive analytics, apart from missing values:
Quantitative and Qualitative variable.
Univariate, Bivariate and multivariate distribution.
Variable relationship to your response(college) variable.
Looking for outliers(multivariate and univariate).
Required variable transformation.
Can be the Y variable broken down into chunks for example location, for example whether a candidate can be a part of Colleges in California or New York. If there is a higher chance of California, then what college. In this way you could capture Linear + non-linear relationships.
For base learners you can fit Softmax regression model or 1 vs all Logistic regression which does not really matters a lot and CART for non-linear relationship. I would also do K-nn and K-means to check for different groups within data and decide on predictive learners.
I hope this makes sense!

The Least-square support vector machine (LSSVM) is a powerful algorithm for this application. Visit http://www.esat.kuleuven.be/sista/lssvmlab/ for more information.

Setting Contrasts for ANOVA in R

I've been attempting to perform an ANOVA in R recently on the attached data frame.
My question revolves around the setting of contrasts.
My design is a 3x5 within-subjects design.
There are 3 visual conditions under 'Circle1' and 5 audio under 'Beep1'.
Does anyone have any idea how I should set the contrasts? This is something I'm unfamiliar with as I'm making the transition from point and click stats in SPSS to coded in R.
Thanks for your time
Data file:

Reiterating my answer from another stackoverflow question that was flagged as similar, since you didn't provide any code, you might start by having a look at the contrast package in R. As they note in the document:
"The purpose of the contrast package is to provide a standardized interface for testing linear combinations of parameters from common regression models. The syntax mimics the contrast. Design function from the Design library. The contrast class has been extended in this package to linear models produced using the functions lm, glm, gls, lme and geese."
There is also a nice little tutorial here by Dr. William King who talks about factorial between subjects ANOVA and also includes an abundance of R code. This is wider scoped than you question but would be a great place to start (just to get context).
Finally, here is another resource that you can refer to which talks about setting up orthogonal contrasts in R.

Deploy R statistical models in WSO2?

A newbie question on WSO2 and 'R'....
I have a customer where they are looking to build some statistical models using 'R'. These models are mostly associated with customer scoring, i.e. sucking in a table of customer data with behavioural attributes as columns, and spitting out a 'score' for each customer.
Two questions on this:
Can 'R' models by deployed like rules in a service model?
Could you deploy R models into a WSO2 middleware, and if so, how and where?
TIA

Note: I'm not familiar with wso2 but I'm with R.
The answer to your question very much depends on what type of models you would like to deploy. The easiest ones are models such as linear/logistic regression followed by decision trees.
The reason they are easy is because for linear & logistic regression you get a nice formula you can plug-in to any programming interface. An example prediction formula might be like the following:
customer_predicted_life_time_value =
17.25+2.365*num_of_products_held-16.12*time_at_address+25.36*monthly_income.
Similarly, decision trees can be easily exported as a bunch of if-then-else rules (there at least a couple of packages in R which will translate the R decision tree model into rules).
You could technically be able to deploy randomForest too in the form of rules but that will be cumbersome if you want to implenent using rules.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex