This question already has answers here:
ezANOVA and pairwise.t.test in R: output
(2 answers)
Closed 7 years ago.
I am using ezANOVA to calculate ANOVAs. I would like calculate a Bayesian measure, which makes use of the values returned in the aov object.
However, I have difficulties accessing the values that are returned in the aov object and consequently do not know how to address them in the function I use for the Bayesian measure.
Let me give an example...
data("ANT") ## example data
rt_anova = ezANOVA(
data = ANT[ANT$error==0,]
, dv = rt
, wid = subnum
, within = .(cue,flank)
, return_aov = T
)
rt_anova
We now get the following for the main effect of cue:
Stratum 2: subnum:cue
Terms:
cue Residuals
Sum of Squares 225485.61 8970.99
Deg. of Freedom 3 57
I now need to access the Degrees of Freedom as well as the Sums of Squares, but I currently have no clue how I must admit (they do not seem to accessible via something like
rt_anova$aov$........
Any suggestions are very welcome!
THANKS!
If you look at rt_anova$aov, you'll see it is of class aovlist. (class(rt_anova$aov)). Do some further exploration (names(rt_anova$aov)), and you can figure out what you need might be rt_anova$aov$'subnum:cue'
Related
This question already has answers here:
psych: principal - loadings components
(1 answer)
Extracting output from principal function in psych package as a data frame
(2 answers)
Closed 5 years ago.
I would like to get the the "Proportion Var" line like object or vector from "pca $ loadings" below, to use its values in the PCA graphic.
I did the following:
library(psych)
data(iris)
pca <- principal(iris[1:4],nfactors = 2,rotate="varimax",scores=TRUE)
pca$loadings
How should I get the proportional var?
Another way is to compute manually, but first you need to extract all factors (e.g., all axes)
You should specify the nfactors as the number of variables that you have, which in your example is 4. So it should be like like this:
pca <- principal(iris[1:4],nfactors = 4,rotate="varimax",scores=TRUE)
Then, extract the pca$loadings after which you can compute the proportional variance by getting the sum of squared loadings per component (RC in this case) and divide them by the total sum squared loadings, which you can do by this:
colSums(pca$loadings[ , ]^2)/sum(pca$loadings[ , ]^2)
This should give you the same information as the proportion variance in the pca$loadings, albeit for all the components (RC1 to RC4 in this case).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm having a matrix of plants(rows) and pollinators(columns) and interaction frequencies within (converted to 0 (no interaction) and 1 (interaction/s present) for this analysis).
I'm using the vegan package and have produced a species accumulation curve.
accum <- specaccum(mydata[1:47,], method = "random", permutations = 1000)
plot(accum)
I now would like to predict how many new pollinator species I would be likely to find with additional plant sampling but can't figure in what format I have to include "newdata" within the predict command. I have tried empty rows and rows with zeros within the matrix but was not able to get results. This is the code I've used for the prediction:
predictaccum1 <- predict(accum, newdata=mydata[48:94,])
The error message:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "specaccum"
The error message does not change if I specify: interpolation = c("linear") or "spline".
Could anyone help please?
Not perhaps the clearest way of putting this, but the documentation says:
newdata: Optional data used in prediction interpreted as number of
sampling units (sites).
It should be a number of sampling units you had. A single number or a vector of numbers will do. However, the predict function cannot extrapolate, but it only interpolates. The nonlinear regression models of fitspecaccum may be able to extrapolate, but should you trust them?
Here a bit about dangers of extrapolation: the non-linear regression models are conventionally used analysing species accumulation data, but none of these is really firmly based on theory -- they are just some nice non-linear regression models. I know of some models that may have a firmer basis, but we haven't implemented them in vegan, neither plan to do so (but contributions are welcome). However, it is possible to get some idea of problems by subsampling your data and seeing if you can estimate the overall number of species with an extrapolation from your subsample. The following shows how to do this with the BCI data in vegan. These data have 50 sample plot with 225 species. We take subsamples of 25 plots and extrapolate to 50:
mod <- c("arrhenius", "gleason", "gitay", "lomolino", "asymp", "gompertz",
"michaelis-menten", "logis", "weibull")
extraps <- matrix(NA, 100, length(mod))
colnames(extraps) <- mod
for(i in 1:nrow(extraps)) {
## use the same accumulation for all nls models
m <- specaccum(BCI[sample(50,25),], "exact")
for(p in mod) {
## need try because some nls models can fail
tmp <- try(predict(fitspecaccum(m, p), newdata=50))
if(!inherits(tmp, "try-error")) extraps[i,p] <- tmp
}
}
When I tried this, most extrapolation models did not include the correct number of species among their predictions, but all values were either higher than correct richness (from worst: Arrhenius, Gitay, Gleason) or lower than correct richness (from worst: logistic, Gompertz, asymptotic, Michaelis-Menten, Lomolino, Weibull; only these two last included the correct richness in their range).
In summary: in lack of theory and adequate model, beware extrapolation.
I'm relatively new to R and am currently in the process of constructing a PLS model using the pls package. I have two independent datasets of equal size, the first is used here for calibrating the model. The dataset comprises of multiple response variables (y) and 101 explanatory variables (x), for 28 observations. The response variables, however, will each be included seperately in a PLS model. The code current looks as follows:
# load data
data <- read.table("....txt", header=TRUE)
data <- as.data.frame(data)
# define response variables (y)
HEIGHT <- as.numeric(unlist(data[2]))
FBM <- as.numeric(unlist(data[3]))
N <- as.numeric(unlist(data[4]))
C <- as.numeric(unlist(data[5]))
CHL <- as.numeric(unlist(data[6]))
# generate matrix containing the explanatory (x) variables only
spectra <-(data[8:ncol(data)])
# calibrate PLS model using LOO and 20 components
library(pls)
refl.pls <- plsr(N ~ as.matrix(spectra), ncomp=20, validation = "LOO", jackknife = TRUE)
# visualize RMSEP -vs- number of components
plot(RMSEP(refl.pls), legendpos = "topright")
# calculate explained variance for x & y variables
summary(refl.pls)
I have currently arrived at the point at which I need to decide, for each response variable, the optimal number of components to include in my PLS model. The RMSEP values already provide a decent indication. However, I would also like to base my decision on the PRESS (Predicted Residual Sum of Squares) statistic, in accordance various studies comparable to the one I am conducting. So in short, I would like to extract the PRESS statistic for each PLS model with n components.
I have browsed through the pls package documentation and across the web, but unfortunately have been unable to find an answer. If there is anyone out here that could help me get in the right direction that would be greatly appreciated!
You can find the PRESS values in the mvr object.
refl.pls$validation$PRESS
You can see this either by exploring the object directly with str or by perusing the documentation more thoroughly. You will notice if you look at ?mvr you will see the following:
validation if validation was requested, the results of the
cross-validation. See mvrCv for details.
Validation was indeed requested so we follow this to ?mvrCv where you will find:
PRESS a matrix of PRESS values for models with 1, ...,
ncomp components. Each row corresponds to one response variable.
I will delete if this is too loosely programming but my search has turned up NULL so I'm hoping someone can help.
I have a design that has a case/control matched pairs design with repeated measurements. Looking for a model/function/package in R
I have 2 measures at time=1 and 2 measures at time=2. I have Case/Control status as Group (2 levels), and matched pairs id as match_id and want estimate the effect of Group, time and the interaction on speed, a continuous variable.
I wanted to do something like this:
(reg_id is the actual participant ID)
speed_model <- geese(speed ~ time*Group, id = c(reg_id,match_id),
data=dataforGEE, corstr="exchangeable", family=gaussian)
Where I want to model the autocorrelation within a person via reg_id, but also within the matched pairs via match_id
But I get:
Error in model.frame.default(formula = speed ~ time * Group, data = dataFullGEE, :
variable lengths differ (found for '(id)')
Can geese or GEE in general not handle clustering around 2 sets of id? Is there a way to even do this? I'm sure there is.
Thank you for any help you can provide.
This is definatly a better question for Cross Validated, but since you have exactly 2 observations per subject, I would consider the ANCOVA model:
geese(speed_at_time_2 ~ speed_at_time_1*Group, id = c(match_id),
data=dataforGEE, corstr="exchangeable", family=gaussian)
Regarding the use of ANCOVA, you might find this reference useful.
Hi…I have a very basic question regarding the input of weighted data into R. Currently I have to process data (mostly for curve fitting purposes) similar to the following:
> head(mydata, 10)
v sf
1 0.3003434 3.933106
2 0.3027852 5.947432
3 0.3052270 9.832596
4 0.3076688 12.927439
5 0.3101106 14.197519
6 0.3125525 13.572904
7 0.3149943 11.691078
8 0.3174361 9.543095
9 0.3198779 8.048558
10 0.3223197 7.660252
The first column is the data (increasing & equidistant), while the 2nd column gives the frequency (weights), currently these weights don't add up to one, but I can easily fix that.
Now, I searched for weighted data in R and the closest I found was via using the survey package and the svydesign() command, but is it really that hard?
What I did to work around my lack of knowledge, and that got me in trouble with the Kolmogorov_Smirnov test (more below), is the following:
> y <- with(mydata, c(rep(v, times=floor(10*sf))))
which will repeat the elements of the first column in proportion to the corresponding weight (times 10 to get a whole number). But now the problem is, when I conduct the Kolmogorov-Smirnov goodness of fit test, I get a warning that the p-value can not be computed since the data has ties.
Question is: How can I input and process the data in its original form (i.e. as a frequency or probability table) for the purpose of curve fitting? Thanks.