I'm researching moth biomass in different biotopes, and I want to find a model that estimates the biomass. I have measured the length and width of the forewing, abdomen and thorax of 37088 specimens, and I have weighed them individually (dried).
First, I wanted to a simple linear regression of each variable on the biomass. The problem is, none of the assumptions are met. The data is not linear, biomass (and some variables) don't follow a normal distribution, there is heteroskedasticity, and a lot of outliers. Now I have tried to transform my data using log, x^2, 1/x, and boxcox, but none of them actually helped. I have also tried Thiel-Sen regression (not possible because of too much data) and Siegel regression (biomass is not a vector). Is there some other form of non-parametric or median-based regression that I can try? Because I am really out of ideas.
Here is a frequency histogram for biomass:
Frequency histogram dry biomass
So what I actually want to do is to build a model that accurately estimates the dry biomass, based on the measurements I performed. I have a power function (Rogers et al.) that is general for all insects, but there is a significant difference between this estimate and what I actually weighed. Therefore, I just want to build to build a model with all significant variables. I am not very familiar with power functions, but maybe it is possible to build one myself? Can anyone recommend a method? Thanks in advance.
To fit a power function, you could perhaps try nlsLM from the minpack.lm package
library(minpack.lm)
m <- nlsLM( y ~ a*x^b, data=your.data.here )
Then see if it performs satisfactory.
I have been trying to calculate the correlation between two categorical variables (level of education, political party voted) in R. I found that I could use Pearson chi-sq test. I applied it and got some results, and I wonder whether it is a true method or not.
Also, how can I visualize my results for that Pearson test in R?
Thx in advance. enter image description here
I am having difficulty in creating the plot like this.
summary plot of regressions
A~H are different exposures for same outcome. The blue line is estimates/95%CI from univariate models (for example, regression of outcome~A and outcome~B) and the red line is estimates/95%CI from a multivariable model (outcome~A+B+C+D+E+F+G+H).
I tried 'plot_summs' function but it cannot process many univariate models.
Thank you for your help. Looking forward to hearing from you.
I have constructed a mixed effect model using lmer() with the aim of comparing the growth in reading scores for four different groups of children as they age.
I would like to plot a graph of the 4 different slopes with confidence intervals in R in order to visualize this relationship but I keep getting stuck.
I have tried to use the plot function and some versions of the ggplot as I have done for previous lm() models but it isn't working so far. Here is my attempted model which I hope looks at how the change in reading scores over time(age) interacts with a child's SESDLD grouping (this indicated whether a child has a language problem and whether or not they are high or low income).
AgeSES.model <- lmer(ReadingMeasure ~ Age.c*SESDLD1 + (1|childid), data = reshapedomit, REML = FALSE)
The ReadingMeasure is a continuous score, age.c is centered age measured in months. SESDLD1 is a categorical measure which has 4 levels. I would expect four positive slopes of ReadingMeasure growth with different intercepts and probably differing slopes.
I would really appreciate any pointers on how to do this!
Thank you so much!!
The type of plot I would like to achieve - this was done in Stata
There are two models fitted with lemr(). The homework ask me to compare them graphically and numerically. I just don't know how to fix it besides comparing the AIC, fix effect and random effect of these two models.
the first model:
child.mutil<-lmer(HIV$CD4PCT~HIV$time1+(1|HIV$newpid))
the second model:
child.mutil2<-lmer(HIV2$CD4PCT~HIV2$time1+treatment.group+visage.group+(1|HIV2$newpid))