ARTool package in R - multiple within factors - r

I have recently discovered the ARTool package for R (https://cran.r-project.org/web/packages/ARTool/) when looking for a non-parametric alternative for a repeated measures ANOVA.
I have used ARTool and find it really very useful, but I came across a problem, that I am not sure how to deal with. Specifically, the Df.res seem to be strongly inflated as soon as I have more than one within factor. I have not come across this when I tried it with two between factors, a between and a within factor, or two between and one within factor, but whenever I add a second within factor, Df.res seems to become inflated.
I just wondered whether I am misunderstanding something or maybe there is an explanation that I am not aware of.
Any response would be greatly appreciated.
Many thanks!

Related

Using permanova in r to analyse the effect of 3 independent variables on reef systems

I am trying to understand how to run PERMANOVA using Adonis2 in R to analyse some data that I have collected. I have been looking online, but as it often happens, explanations are a bit convoluted, so I am asking for your help, if you can help me. I have got some fish and coral groups as columns, as well as 3 independent variables (reef age, depth, and material). Snapshot of my dataset structure I think I have understood that p-values are not the only important bit of the output, and that the R2 values indicate how much each variable contributes to the model. Is there something wrong or that I am missing here? Also, I think I understood that I should check for homogeneity of variance, but I have not understood, again, if I should check for it on each variable independently, or if I should include them all in the same bit of code (which does not seem to work). Here are the bit of code that I am using to run the PERMANOVA (1), and the one that I am trying to use to assess homogeneity of variance - which does not work (2).
(1) adonis2(species ~ Age+Material+Depth,data=data.var,by="margin)
'Species' is the subset of the dataset including all the species'count, while 'data.var'is the subset including the 3 independent variables. Also what is the difference in using '+' or '' in the code? When I use '' it gives me - 'Error in qr.X(object$CCA$QR) :
need larger value of 'ncol' as pivoting occurred'. What does this mean?
(2) variance.check<-betadisper(species.distance,data.var, type=c("centroid"), bias.adjust= FALSE)
'species.distance' is a matrix calculated through 'vegdist' using Bray-Curtis method. I used 'data.var'to check variance on all the 3 independent variables, but it does not work, while it works if I check them independently (3). Why is that?
(3) variance.check<-betadisper(species.distance, data$Depth, type=c("centroid"), bias.adjust= FALSE)
Thank you in advance for your responses, and for your help. It will really help me to get my head around it (and sorry for the many questions).

How to prepare the variables for propensity score matching in R using MatchIt?

I have a large dataset with around 200 columns and 1 million rows. I have a treatment group, and I'm trying to create a control group using propensity matching score based on about 15 different variables.
I have two questions that I've found conflictual answers online, and I would appreciate it if you could help me out.
1) How to organize the data to best run the matching process? My data has a mix of numeric, character, and factor (some ordered, others not) variables, and I've seen online some people saying that the MatchIt program runs the analysis with character variables, while others saying that it does not work for the 'nearest' function but works with other ones. So, should I put some effort into converting everything into numeric or factor (which I'm not sure it will be possible), or can I run the MatchIt with my variables as they are?
2) Has the function MatchIt been updated to read NAs in variables that are not used for the matching function? I've seen some old posts saying that the MatchIt needed a COMPLETE dataset, even for the variables that were not being used for matching, but these posts also said that it was something that would probably be fixed. Is it still the case?
Thanks
1) Beyond the data type, the question you should ask yourself is what sense it makes to give categorical data to a propensity score setting. Propensity scores are based on distances between observations, and calculating distances between categorical attributes is obviously difficult. So even though technically speaking, MatchIt does support other types, numeric features is the only really sensible data input. You can either choose to discard the categorical data from your data or convert it to numeric (by creating dummy variables and numerically encoding ordinal features). Alternatively, you can keep the categorical features and impose exact matching on these features using the exact parameter of the matchit function (note that in this case, you are not really using propensity score matching anymore..).
2) This issue has not been solved in the current version 3.0.2, which is obviously annoying..

VAR model with variable combination and variation

I tried searching for an answer for this question of mine, however I could not find anything.
I want to build a model that predicts barley prices for that i came up with 11 variables that may have an impact on the prices. What I tried doing was building a loop that chooses every time one extra variable from my pool of variables and tries different combinations of them and the output would be for every (extra/combination) variable a new VAR-model, so in a sense, it is a combinatorics exercise. After that, i want to implement an in/out of sample testing for each of the models that I came up with to decide which one is the most appropriate. Unfortunately, i am not very familiar with loops and i have been told not to use them on R... As I am a beginner on R, my tryouts won't help you out at all, but if you really require them I am happy to provide them to you.
Many thanks in advance!

Which technique is best used to find optimal split on numeric data to reduced error on group?

I have a dataset that contains a numeric variable and a binary categorical variable. I want to find the optimal split on the numeric variable that can be used to quickly classify the categories and limit the amount of error.
I have used a decision tree to do this but am wondering if there are better optimization methods out there to do this?
I would like to be able to do this in R but am having trouble how to write the function for this.
Please help me understand the simple optimisation problem. Thanks!

Determine number of factors in EFA (R) using Comparison Data

I am looking for ways to determine number of optimal factors in R factanal function. The most used method (conduct a pca and use scree plot to determine the number of factors) is already known to me. I have found a method described here to be easier for non technical folks like me. Unfortunately the R script is no longer accessible in which the method was implemented. I was wondering if there is a package available in R that does the same?
The method was originally proposed in this study: Determining the number of factors to retain in an exploratory factor analysis using comparison data of known factorial structure.
The R code is now moved here as per the author.
EFA.dimensions ist also a nice and easy to use package for that

Resources