In R, what is the difference between ICCbare and ICCbareF in the ICC package? - r

I am not sure if this is a right place to ask a question like this, but Im not sure where to ask this.
I am currently doing some research on data and have been asked to find the intraclass correlation of the observations within patients. In the data, some patients have 2 observations, some only have 1 and I have an ID variable to assign each observation to the corresponding patient.
I have come across the ICC package in R, which calculates the intraclass correlation coefficient, but there are 2 commands available: ICCbare and ICCbareF.
I do not understand what is the difference between them as they do give completely different ICC values on the same variables. For example, on the same variable, x:
ICCbare(ID,x) gave me a value of -0.01035216
ICCbareF(ID,x) gave me a value of 0.475403
The second one using ICCbareF is almost the same as the estimated correlation I get when using random effects models.
So I am just confused and would like to understand the algorithm behind them so I could explain them in my research. I know one is to be used when the data is balanced and there are no NA values.
In the description it says that it is either calculated by hand or using ANOVA - what are they?

By: https://www.rdocumentation.org/packages/ICC/versions/2.3.0/topics/ICCbare
ICCbare can be used on balanced or unbalanced datasets with NAs. ICCbareF is similar, however ICCbareF should not be used with unbalanced datasets.

Related

Metric for evaluating agreement at inter-rater reliability for a single subject by multiple raters

I'm making a rating survey in R (Shiny) and I'm tryng to find a metric that can evaluate the agreement but for only one of the "questions" in the survey. The ratings range from 1 to 5. There is multiple raters and each rater rates a set of 10 questions according to the ratings.
I've used Fleiss Kappa and Krippendorff Alpha for the whole set of questions and raters and it works but when evaluating each question separately these metrics give negative value. I tried calculating them by hand (formulas) and I still get the same results so I guess that they don't work for a small sample of subjects (in this case a sample of 1).
I've looked at other metrics like rwg in the multilevel package but thus far I can't seem to make it work. According to r documentation:
rwg(x, grpid, ranvar=2)
Where:
x = A vector representing the construct on which to estimate agreement.
grpid = A vector identifying the groups from which x originated.
Can someone explain me what the rwg function expects from me?
If someone know some other agreement metric that might work better please let me know.
Thanks.

For synthetic control (synth package), how to avoid using predictor.op?

I am trying to use the synth package in R.
The way synthetic control works is that it matches pre-treatment data for a treated unit and control units, and it selects weights to approximate equate the two, so that the treated unit "looks like" a synthetic control unit.
The way it works is explained here.
When matching on the pre-treatment outcomes, we pick up to T0 linear combination of the data. The synth package seems to only pick just one, and it is the one that equates the MEANS. This is what the predictor.op function does.
Suppose, however, I want to just have it so that I select all T0 linear combinations so X1 is a T0 x 1 vector rather than a 1x1, is there a way to do this non-manually?
I am not sure what exactly you are trying to do but I ran into your question because I had a similar issue with Synth() so maybe this will help:
I tried to create a synthetic control unit using all pre-treatment outcome observations and since Synth() averages across all pre-treatment periods, that wasn't too straightforward. What I did is to create individual covariates for each pre-treatment period and then specify those covariates in predictor. That is equivalent to not applying any operator to pre-treatment outcome data.

Can I use quickpred in Mice to impute a subset of variables from a larger set of variables in a nested longitudinal (and long) dataframe?

I've tried to create a test data.frame to demonstrate my question but my r capacity isn't quite strong enough to even do that. I am not in a position to share my true database. I hope my question can stand on its own.
I am working with a nested longitudinal dataset that is saved as a long file (1000 subjects nested in 8 sites, 4 potential time points/subject, 68 potential predictor variables). I want to impute missing values on 4 static predictors (e.g., maternal education, family income) prior to conducting lme on the longitudinal outcomes in order to have a consistent number of cases for all models.
I am working with the package mice in r. From all that I have read, it is recommended that I use all the variables in my models and any other variables that may predict the missing values in my imputation. Given the number of variables in my models, I need something like quickpred to simplify this. But I'm getting an error that I do not understand.
I tried the following initial code for my database N2NPL, indicating c(14, 16, 18, 19) as the variables that I want to predict.
iniN2NPL <- mice(N2NPL[,c(14,16,18,19)], pred= quickpred(N2NPL,
minpuc = 0.25, exclude = c('ID','TypeConvNon','TypeCtPr','TypeName','CHR_converter')),
maxit = 0)
"Error in check.predictorMatrix(setup) :
The predictorMatrix has 73 rows and 73 columns. Both should be 4'
I know that mice::quickpred needs to be a square matrix, but is there anyway of not imputing all of the variables? Is it sufficient to include site as a predictor given the nesting of subjects within sites?
Thank you for any help directing me to the proper code or instructions on this. The examples I see all seem much simpler than mine, and thus little help with the issues I'm having.

how to find differentially methylated regions (for example with probe lasso in Champ) based on regression continuous variable ~ beta (with CpGassoc)

I performed 450K Illumina methylation chips on human samples, and want to search for the association between a continuous variable and beta, adjusted for other covariates. For this, I used the CpGassoc package in R. I would also like to search for differentially methylated regions based on the significant CpG sites. However, the probe lasso function in the Champ package and also other packages for 450K DMR analyses always assume 2 groups for which DMRs need to be find. I do not have 2 groups, but this continuous variable. Is there a way to load my output from CpGassoc in the probe lasso function from Champ? Or into another bump hunter package? I'm a MD, not a bio-informatician, thus comb-p, etc. would not be possible for me.
Thank you very much for your help.
Kind regards,
Line
I have not worked with methylation data before, so take what I say with a grain of salt. Also, don't use acronyms without describing them I'm guessing most people on this site don't know what a DMR is.
you could use lasso from the glmnet package to run a lasso on your data. So if your continuous variable was age you could do something like. If meth.dt is your methylations data.table with your columns as the amount of methylation for a given site, and your rows as subjects. I'm not sure if methylation data is considered to be poisson, I know RNA-seq data is. I also can't get too specific but the following code should work after adjusting to your number of columns
#load libraries
library(data.table)
library(glmnet)
#read in data
meth.dt <- fread("/data")
#lasso
AgeLasso <- glmnet(as.matrix(meth.dt[,1:70999,with=F]),meth.dt$Age, family="poisson")
cv.AgeLasso <- cv.glmnet(as.matrix(meth.dt[,1:70999,with=F]), meth.dt$Age, family="poisson")
coefTranscripts <- coef(cv.AgeLasso, s= "lambda.1se")[,1][coef(cv.AgeLasso, s= "lambda.1se")[,1] != 0]
This will give you the methylation sites that are the best predictors of your continuous variable using a parsimonious model. For additional info about glmnet see http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html
Also might want to ask the people over at cross validated. They may have some better answers. http://stats.stackexchange.com
What is your continuous variable just out of curiosity?
Let me know how you ended up solving it if you don't use this method.

How to replicate Stata "factor" command in R

I'm trying to replicate some Stata results in R and am having a lot of trouble. Specifically, I want to recover the same eigenvalues as Stata does in exploratory factor analysis. To provide a specific example, the factor help in Stata uses bg2 data (something about physician costs) and gives you the following results:
webuse bg2
factor bg2cost1-bg2cost6
(obs=568)
Factor analysis/correlation Number of obs = 568
Method: principal factors Retained factors = 3
Rotation: (unrotated) Number of params = 15
--------------------------------------------------------------------------
Factor | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 0.85389 0.31282 1.0310 1.0310
Factor2 | 0.54107 0.51786 0.6533 1.6844
Factor3 | 0.02321 0.17288 0.0280 1.7124
Factor4 | -0.14967 0.03951 -0.1807 1.5317
Factor5 | -0.18918 0.06197 -0.2284 1.3033
Factor6 | -0.25115 . -0.3033 1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated: chi2(15) = 269.07 Prob>chi2 = 0.0000
I'm interested in the eigenvalues in the first column of the table. When I use the same data in R, I get the following results:
bg2 = read.dta("bg2.dta")
eigen(cor(bg2)
$values
[1] 1.7110112 1.4036760 1.0600963 0.8609456 0.7164879 0.6642889 0.5834942
As you can see, these values are quite different from Stata's results. It is likely that the two programs are using different means of calculating the eigenvalues, but I've tried a wide variety of different methods of extracting the eigenvalues, including most (if not all) of the options in R commands fa, factanal, principal, and maybe some other R commands. I simply cannot extract the same eigenvalues as Stata. I've also read through Stata's manual to try and figure out exactly what method Stata uses, but couldn't figure it out with enough specificity.
I'd love any help! Please let me know if you need any additional information to answer the question.
I would advise against carrying out a factor analysis on all the variables in the bg2 data as one of the variables is clinid, which is an arbitrary identifier 1..568 and carries no information, except by accident.
Sensibly or not, you are not using the same data, as you worked on the 6 cost variables in Stata and those PLUS the identifier in R.
Another way to notice that would be to spot that you got 6 eigenvalues in one case and 7 in the other.
Nevertheless the important principle is that eigen(cor(bg2)) is just going to give you the eigenvalues from a principal component analysis based on the correlation matrix. So you can verify that pca in Stata would match what you report from R.
So far, so clear.
But your larger question remains. I don't know how to mimic Stata's (default) factor analysis in R. You may need a factor analysis expert, if any hang around here.
In short, PCA is not equal to principal axis method factor analysis.
Different methods of calculating eigenvalues are not the issue here. I'd bet that given the same matrix Stata and R match up well in reporting eigenvalues. The point is that different techniques mean different eigenvalues in principle.
P.S. I am not an R person, but I think what you call R commands are strictly R functions. In turn I am open to correction on that small point.

Resources