Tests of Between-Subjects Effects - r

I use from code R for table Tests of Between-Subjects Effects calculate in spss but results are diffrent and really I don't know that why are the reasons? , please help me . thanka alot
fit2222<-manova(cbind(postrunning,postbalance,postbilateral,poststrngth,postupperlimb,postresponse
,postvisumotor,postdexterity)
~ group+prerunning+ prebalance+prebilateralcoordination + prestrength + preupperlimbcoordination
+preresponse+ previsumotor+predexterity ,data=Data)
summary.aov(fit2222)
input in R
'''
input in spss

It's impossible to give a complete answer to this, or at least exactly what you should be doing differently, but it's clear from the output you're showing that you're not comparing the same results in the two packages. Note the numerator degrees of freedom (df) values, which are all 1 in R and 2 in SPSS.

Related

Difference between R and SPSS linear model results

I'm a beginner at statistics. Currently attending an introductory course, which uses spss. I've been trying to learn r at the same time, and so far I've consistently been getting the same results, for calculations with both tools, As expected.
However, we're currently doing correlations (Pearson's Rho), and fitting linear models, and I'm consistently getting different results between R and SPSS.
The dataset is GSS2012.zip in this zip-file.
d = GSS2012$tolerance
e = GSS2012$age
f = GSS2012$polviews
g = GSS2012$educ
SPSS R std. error (SPSS)
intercept 6,694 7,29707726 0,623
e -0,031 -0,03130627 0,006
f -0,123 -0,20586503 0,072
g 0,411 0,40029541 0,033
Full, minimal working examples to get the results above, are found below.
I've tried different use="stuff" for cor; didn't make difference.
cor(d, e, use = "pairwise.complete.obs")
Full, minimal working example for lm:
> library(haven)
> GSS2012 <- read_sav("full version/GSS2012.sav")
> lm(GSS2012$tolerance ~ GSS2012$age + GSS2012$polviews + GSS2012$educ, na.action="na.exclude", singular.ok = F)
Call:
lm(formula = GSS2012$tolerance ~ GSS2012$age + GSS2012$polviews +
GSS2012$educ, na.action = "na.exclude", singular.ok = F)
Coefficients:
(Intercept) GSS2012$age GSS2012$polviews GSS2012$educ
7.29708 -0.03131 -0.20587 0.40030
Nothing has so far given me the same values as SPSS. ---Not that I know the latter are necessarily correct, I'd just like to replicate the results.
SPSS script:
DATASET ACTIVATE DataSet1.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT tolerance
/METHOD=ENTER age polviews educ.
Articles like these are probably related: link1; link2; link3, but I haven't been able to use the information therein to replicate the SPSS data. (Again, R might have more accurate results; I don't know. But I'm in "an SPSS environment", and thus it would be good if I'd be able to get the same results for now :)
This is only a partial answer as I can see what the problem is, although I'm not sure what is causing it. The issue is to do with missing values and how they're handled in the SPSS file. Lets just take the educ variable as an example...
In the SPSS file you can see that the values 97, 98, and 99 are defined as missing values:
If you sort the SPSS file by the educ column, you can see there are 2 data rows with these missing values. They are IDs 837 and 1214:
In R, you can confirm that those rows do infact contain missing values (NA):
> which(is.na(GSS2012$educ))
[1] 837 1214
The problem is in SPSS, when you actually tell it to count how many rows are missing, it says theres only 1 missing data row:
FREQUENCIES VARIABLES=educ
/FORMAT=NOTABLE
/ORDER= ANALYSIS .
The problem is with ID 1214. SPSS is not considering that 99 value for 1214 to be missing. For example, try changing educ for 837 to any other (non-missing) number, and you'll see that SPSS says there are 0 missing rows for educ, when in fact 1214 should still be missing (99)
I haven't checked, but I'm guessing a similar thing is happening to a number of rows for the polviews variable.
The consequence of this is that SPSS isn't treating those rows as missing data when you run the analysis, but in R those values are correctly set as missing and omitted. In other words, SPSS is using more data for the model than it should be using. You can confirm this by looking at the SPSS and R output - the degrees of freedom are different across the 2 programs, which then leads to a (slight) difference in results
I'm not sure why SPSS isnt treating those rows as missing. It could either be a bug (wouldn't be the first for SPSS...) or something to do with the way the file has been set up. I haven't checked the latter because its a big file and I'm not familiar enough with the dataset to know where to look

In R, what is the difference between ICCbare and ICCbareF in the ICC package?

I am not sure if this is a right place to ask a question like this, but Im not sure where to ask this.
I am currently doing some research on data and have been asked to find the intraclass correlation of the observations within patients. In the data, some patients have 2 observations, some only have 1 and I have an ID variable to assign each observation to the corresponding patient.
I have come across the ICC package in R, which calculates the intraclass correlation coefficient, but there are 2 commands available: ICCbare and ICCbareF.
I do not understand what is the difference between them as they do give completely different ICC values on the same variables. For example, on the same variable, x:
ICCbare(ID,x) gave me a value of -0.01035216
ICCbareF(ID,x) gave me a value of 0.475403
The second one using ICCbareF is almost the same as the estimated correlation I get when using random effects models.
So I am just confused and would like to understand the algorithm behind them so I could explain them in my research. I know one is to be used when the data is balanced and there are no NA values.
In the description it says that it is either calculated by hand or using ANOVA - what are they?
By: https://www.rdocumentation.org/packages/ICC/versions/2.3.0/topics/ICCbare
ICCbare can be used on balanced or unbalanced datasets with NAs. ICCbareF is similar, however ICCbareF should not be used with unbalanced datasets.

Nested model in R

I'm having a huge problem with a nested model I am trying to fit in R.
I have response time experiment with 2 conditions with 46 people each and 32 measures each. I would like measures to be nested within people and people nested within conditions, but I can't get it to work.
The code I thought should make sense was:
nestedmodel <- lmer(responsetime ~ 1 + condition +
(1|condition:person) + (1|person:measure), data=dat)
However, all I get is an error:
Error in checkNlevels(reTrms$flist, n = n, control) :
number of levels of each grouping factor must be < number of observations
Unfortunately, I do not even know where to start looking what the problem is here.
Any ideas? Please, please, please? =)
Cheers!
This might be more appropriate on CrossValidated, but: lme4 is trying to tell you that one or more of your random effects is confounded with the residual variance. As you've described your data, I don't quite see why: you should have 2*46*32=2944 total observations, 2*46=92 combinations of condition and person, and 46*32=1472 combinations of measure and person.
If you do
lf <- lFormula(responsetime ~ 1 + condition +
(1|condition:person) + (1|person:measure), data=dat)
and then
lapply(lf$reTrms$Ztlist,dim)
to look at the transposed random-effect design matrices for each term, what do you get? You should (based on your description of your data) see that these matrices are 1472 by 2944 and 92 by 2944, respectively.
As #MrFlick says, a reproducible example would be nice. Other things you could show us are:
fit the model anyway, using lmerControl(check.nobs.vs.nRE="ignore") to ignore the test, and show us the results (especially the random effects variances and the statement of the numbers of groups)
show us the results of with(dat,table(table(interaction(condition,person))) to give information on the number of replicates per combination (and similarly for measure)

How to replicate Stata "factor" command in R

I'm trying to replicate some Stata results in R and am having a lot of trouble. Specifically, I want to recover the same eigenvalues as Stata does in exploratory factor analysis. To provide a specific example, the factor help in Stata uses bg2 data (something about physician costs) and gives you the following results:
webuse bg2
factor bg2cost1-bg2cost6
(obs=568)
Factor analysis/correlation Number of obs = 568
Method: principal factors Retained factors = 3
Rotation: (unrotated) Number of params = 15
--------------------------------------------------------------------------
Factor | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 0.85389 0.31282 1.0310 1.0310
Factor2 | 0.54107 0.51786 0.6533 1.6844
Factor3 | 0.02321 0.17288 0.0280 1.7124
Factor4 | -0.14967 0.03951 -0.1807 1.5317
Factor5 | -0.18918 0.06197 -0.2284 1.3033
Factor6 | -0.25115 . -0.3033 1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated: chi2(15) = 269.07 Prob>chi2 = 0.0000
I'm interested in the eigenvalues in the first column of the table. When I use the same data in R, I get the following results:
bg2 = read.dta("bg2.dta")
eigen(cor(bg2)
$values
[1] 1.7110112 1.4036760 1.0600963 0.8609456 0.7164879 0.6642889 0.5834942
As you can see, these values are quite different from Stata's results. It is likely that the two programs are using different means of calculating the eigenvalues, but I've tried a wide variety of different methods of extracting the eigenvalues, including most (if not all) of the options in R commands fa, factanal, principal, and maybe some other R commands. I simply cannot extract the same eigenvalues as Stata. I've also read through Stata's manual to try and figure out exactly what method Stata uses, but couldn't figure it out with enough specificity.
I'd love any help! Please let me know if you need any additional information to answer the question.
I would advise against carrying out a factor analysis on all the variables in the bg2 data as one of the variables is clinid, which is an arbitrary identifier 1..568 and carries no information, except by accident.
Sensibly or not, you are not using the same data, as you worked on the 6 cost variables in Stata and those PLUS the identifier in R.
Another way to notice that would be to spot that you got 6 eigenvalues in one case and 7 in the other.
Nevertheless the important principle is that eigen(cor(bg2)) is just going to give you the eigenvalues from a principal component analysis based on the correlation matrix. So you can verify that pca in Stata would match what you report from R.
So far, so clear.
But your larger question remains. I don't know how to mimic Stata's (default) factor analysis in R. You may need a factor analysis expert, if any hang around here.
In short, PCA is not equal to principal axis method factor analysis.
Different methods of calculating eigenvalues are not the issue here. I'd bet that given the same matrix Stata and R match up well in reporting eigenvalues. The point is that different techniques mean different eigenvalues in principle.
P.S. I am not an R person, but I think what you call R commands are strictly R functions. In turn I am open to correction on that small point.

Regression coefficients by group in dataframe R

I have data of various companies' financial information organized by company ticker. I'd like to regress one of the columns' values against the others while keeping the company constant. Is there an easy way to write this out in lm() notation?
I've tried using:
reg <- lmList(lead2.dDA ~ paudit1 + abs.d.GINDEX + logcapx + logmkvalt +
logmkvalt2|pp, data=reg.df)
where pp is a vector of company names, but this returns coefficients as though I regressed all the data at once (and did not separate by company name).
A convenient and apparently little-known syntax for estimating separate regression coefficients by group in lm() involves using the nesting operator, /. In this case it would look like:
reg <- lm(lead2.dDA ~ 0 + pp/(paudit1 + abs.d.GINDEX + logcapx +
logmkvalt + logmkvalt2), data=reg.df)
Make sure that pp is a factor and not a numeric. Also notice that the overall intercept must be suppressed for this to work; in the new formulation, we have a different "intercept" for each group.
A couple comments:
Although the regression coefficients obtained this way will match those given by lmList(), it should be noted that with lm() we estimate only a single residual variance across all the groups, whereas lmList() would estimate separate residual variances for each group.
Like I mentioned in my earlier comment, the lmList() syntax that you gave looks like it should have worked. Since you say it didn't, this leads me to expect that really the problem is something else (although it's hard to tell what without a reproducible example), and so it seems likely that the solution I posted will fail for you as well, for the same unknown reasons. If you want more detailed guidance, please provide more information; help us help you.

Resources