I've run successfully a logistic regression for a complex design survey where data were imputed in multiple datasets with mitools.
Although I can get the confidence interval and the significance of each variable, I'm interested in studying the significance of a block of variables (dummy variables that represent a categorical variable with several categories). That could be accomplished subtracting the log-likelihoods of models with and without this block of variables.
Can this be accomplished with mitools?
Thank you.
I am using the rcorr function within the Hmisc package in R to develop Pearson correlation coefficients and corresponding p-values when analyzing the correlation of several fishery landings time series. The data isn't really important here but what I would like to know is: how are the p-values calculated for this? It states that the asymptotic P-values are approximated by using the t or F distributions but I am wondering if someone could help me find some more information on this or an equation that describes how exactly these values are calculated.
First post!
I'm a biologist with limited background in applied statistics and R. Basically know enough to be dangerous, so I'd appreciate it someone could confirm/deny that I'm on the right path.
My datasets consists of count data (wildlife visits to water wells) as a response variable and multiple continuous predictor variables (environmental measurements).
First, I eliminated multicolinearity by dropping a few predictor variables. Second, I investigated the distribution of the response variable. Initially, it looked Poisson. However, a Poisson exact test came back as significant, and the variance of the response variable was around 200 with a mean around 9, i.e. overdispersed. Due to this, I decided to move forward with Negative Binomial and Quasipoisson regressions. Both selected the same model, the residuals of which are in a normal distribution. Further, a plot of residuals over predicted values is unbiased and homoscedastic.
Questions:
1. Have I selected the correct regressions to model this data?
2. Are there additional assumptions of the NBR and QpR that I need to test? How should I/Where can I learn about how to do these?
3. Did I check for overdispersion correctly? Is there a difference in comparing the mean and variance vs comparing the conditional mean and variance of the response variable?
4. While the NBR and QpR called the same model, is there a way to select which is the "better" approach?
5. I would like to eventually publish. Are there more analyses I should perform on my selected model?
I am working with a database that involves weighted statistics and complex survey design, namely the National/Nationwide Inpatient Sample, so I am using R's 'survey' package for tasks like summary statistics, regression, etc.
One test I was interesting in running was comparing the population of a certain group in 2 different populations (i.e. is the difference between the proportion of A in Population B and proportion of A in Population C statistically significant). I have already used scvyciprop to plot the confidence intervals for these proportions, and have seen that these two proportions are significantly different. However, I was wondering if there was a function like prop.test, whether or not it's in the 'survey' package, that can run this test for data in a complex survey (ex. takes a survey design object as an argument) and actually quantify the p-value / t-statistic. Does svyttest have this functionality, or is there another function I could potentially use?
Thank you!
I have a data including different types:
a <- data.frame(x=c("a","b","b","c","c","c","d","d","e","f"),y=c(1,2,2,2,3,1,4,7,10,2),m=c("a","d","ab","ac","ac","vc","ed","ed","e","df"),n=c(2,1,5,3,3,2,8,10,10,1))
Actually, the data is more complex than this, probably including date as well. Furthermore, this is an unsupervised issue. So there is no "class labels" here. So I cannot use the methods such as ANOVA. So, how can I find correlation between each two columns?
P.S. I find a function called mixed.cor in psych package, but cannot understand how to use it.
Furthermore, correlation is just representing the linear relation. What function should I use if I want to know the important of every column?
The measure of correlation that most people use for numeric variables (i.e. Pearson correlation) is not defined for categorical data. If you want to measure the association between a numeric variable and a categorical variable, you can use ANOVA. If you want to measure the association between two categorical variables, you can use a Chi-Squared test. If your categorical variable is ordered (e.g. low, medium, high), you can use Spearman rank correlation.