Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
I am trying to do a logistic regression in R with weights, but I dont really know how it works. When I apply weights, something weird happens and all the values appear at 1 but I dont see why? (also how can I fit a line through the points?)
I try to calculate a correlation coefficient for the observed value to the predicted value. Also I am aiming for a plot with "fra" on the y-axis ranging from 0-1, the temp on the x-axis, the fra values in the plot and a line for the regression (something like this example: http://imgur.com/FWevi36)
Thanks!
What I have so far (made up code):
#Dataframe
temp=c(1,1,2,2,3,4,4,5,5,6,6,7,7,8,8)
fra=c(0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.2,0.2,0.3,0.1,0.3,0.4,0.0,0.5)
bin=c(0,0,0,0,0,0,1,1,1,1,1,1,1,0,1)
test1 <- as.data.frame(cbind(temp,bin,fra))
#Overview
plot(test1$temp, test1$bin)
plot(test1$fra)
boxplot(test1$temp ~ test1$bin, horizontal=TRUE)
#Logistic Regression without weight
glmt1 <- glm(test1$bin~test1$temp, family=binomial)
coefficients(summary(glmt1))
fit1 <- fitted(glmt1)
#plot
plot(test1$temp, fit1, ylim=range(0,1))
#line should go to points..???
lines(test1$bin, glmt1$fitted, type="l", col="red")
#with weighted
glmt2 <- glm(test1$bin~test1$temp, family=binomial, weights=test1$fra)
coefficients(summary(glmt2))
fit2 <- fitted(glmt2)
plot(test1$temp, fit2, ylim=range(0,1))
You are only giving a positive weight to cases where bin == 1. That removes all variation in the response variable (you have fit1$bin in the LHS this time). That means your model always predicts 1 no matter what the value of temp1$temp
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 4 days ago.
Improve this question
I want to know if there is a significant difference in a blood biomarker concentration between 2 populations (population 1 = healthy individuals - population 2 = sick individuals). I need to control for the factor 'region'.
My issue is that the distribution of population 2 is not normal (data are censored following the upper detection limit by the lab device) as shown on these plots:
With a normal distribution I would use this model in R:
m <- glm(blood.biomarker ~ status + region + status*region, data=f, family="gausian") # status = healthy or sick
summary(m)
emmeans(m, list(pairwise ~ status), adjust = "tukey")
I am a bit confused regarding the model or the glm family I should use in this case.
I also have a similar situation but with 3 groups (1 group has a normal distribution and 2 groups have a censored distribution). How to deal with this?
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 4 years ago.
Improve this question
I have a response variable and an independent variable that visually fit to a saturation growth-rate model. How can I fit such model in R? Thank you!
give the nls function a try, but next time please provide some example data. I use the data from this excellent tutorial of a colleague (https://bscheng.com/2014/05/07/modeling-logistic-growth-data-in-r/):
library("car"); library("ggplot2")
#Here's the data
mass<-c(6.25,10,20,23,26,27.6,29.8,31.6,37.2,41.2,48.7,54,54,63,66,72,72.2,
76,75) #Wilson's mass in pounds
days.since.birth<-c(31,62,93,99,107,113,121,127,148,161,180,214,221,307,
452,482,923, 955,1308) #days since Wilson's birth
data<-data.frame(mass,days.since.birth) #create the data frame
plot(mass~days.since.birth, data=data) #always look at your data first!
wilson<-nls(mass~phi1/(1+exp(-(phi2+phi3*days.since.birth))),
start=list(phi1=100,phi2=-1.096,phi3=.002),data=data,trace=TRUE)
#set parameters
phi1<-coef(wilson)[1]
phi2<-coef(wilson)[2]
phi3<-coef(wilson)[3]
x<-c(min(data$days.since.birth):max(data$days.since.birth)) #construct a range of x values bounded by the data
y<-phi1/(1+exp(-(phi2+phi3*x))) #predicted mass
predict<-data.frame(x,y) #create the prediction data frame#And add a nice plot (I cheated and added the awesome inset jpg in another program)
ggplot(data=data,aes(x=days.since.birth,y=mass))+
geom_point(color='blue',size=5)+theme_bw()+
labs(x='Days Since Birth',y='Mass (lbs)')+
scale_x_continuous(breaks=c(0,250,500,750, 1000,1250))+
scale_y_continuous(breaks=c(0,10,20,30,40,50,60,70,80))+
theme(axis.text=element_text(size=18),axis.title=element_text(size=24))+
geom_line(data=predict,aes(x=x,y=y), size=1)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm having a matrix of plants(rows) and pollinators(columns) and interaction frequencies within (converted to 0 (no interaction) and 1 (interaction/s present) for this analysis).
I'm using the vegan package and have produced a species accumulation curve.
accum <- specaccum(mydata[1:47,], method = "random", permutations = 1000)
plot(accum)
I now would like to predict how many new pollinator species I would be likely to find with additional plant sampling but can't figure in what format I have to include "newdata" within the predict command. I have tried empty rows and rows with zeros within the matrix but was not able to get results. This is the code I've used for the prediction:
predictaccum1 <- predict(accum, newdata=mydata[48:94,])
The error message:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "specaccum"
The error message does not change if I specify: interpolation = c("linear") or "spline".
Could anyone help please?
Not perhaps the clearest way of putting this, but the documentation says:
newdata: Optional data used in prediction interpreted as number of
sampling units (sites).
It should be a number of sampling units you had. A single number or a vector of numbers will do. However, the predict function cannot extrapolate, but it only interpolates. The nonlinear regression models of fitspecaccum may be able to extrapolate, but should you trust them?
Here a bit about dangers of extrapolation: the non-linear regression models are conventionally used analysing species accumulation data, but none of these is really firmly based on theory -- they are just some nice non-linear regression models. I know of some models that may have a firmer basis, but we haven't implemented them in vegan, neither plan to do so (but contributions are welcome). However, it is possible to get some idea of problems by subsampling your data and seeing if you can estimate the overall number of species with an extrapolation from your subsample. The following shows how to do this with the BCI data in vegan. These data have 50 sample plot with 225 species. We take subsamples of 25 plots and extrapolate to 50:
mod <- c("arrhenius", "gleason", "gitay", "lomolino", "asymp", "gompertz",
"michaelis-menten", "logis", "weibull")
extraps <- matrix(NA, 100, length(mod))
colnames(extraps) <- mod
for(i in 1:nrow(extraps)) {
## use the same accumulation for all nls models
m <- specaccum(BCI[sample(50,25),], "exact")
for(p in mod) {
## need try because some nls models can fail
tmp <- try(predict(fitspecaccum(m, p), newdata=50))
if(!inherits(tmp, "try-error")) extraps[i,p] <- tmp
}
}
When I tried this, most extrapolation models did not include the correct number of species among their predictions, but all values were either higher than correct richness (from worst: Arrhenius, Gitay, Gleason) or lower than correct richness (from worst: logistic, Gompertz, asymptotic, Michaelis-Menten, Lomolino, Weibull; only these two last included the correct richness in their range).
In summary: in lack of theory and adequate model, beware extrapolation.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
How can I create a normal probability plot of residuals in R so that there are normal probability values on y-axis?
Normally you'll make the normal probability plot with qqnorm and qqline.
Example:
fit <- lm(resp ~ dep1 + dep2)
qqnorm(fit$residuals, datax=TRUE)
qqline(fit$residuals, datax=TRUE)
You can get residuals vs. prob. with the plot and pnorm:
plot(fit$residuals, pnorm(fit$residuals))
(with prob. on the y-axis)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
It would be great if someone can check whether my approach is correct or not.
Question in short will be, if the error calculation is the correct way.
lets assume i have the following data.
data = c(23.7,25.47,25.16,23.08,24.86,27.89,25.9,25.08,25.08,24.16,20.89)
Furthermore i want to check if my data follows a normal distribution.
Edit: I know that there are tests etc. but i will concentrate on constructing the qqplot with confidence lines. I know that there is a method in the car package, but i want to understand the building of these lines.
So i calculate the percentiles for my sample data as well as for my theoretical distribution (with estimated mu = 24.6609and sigma = 1.6828. So i end up with these two vectors containing the percentiles.
percentileReal = c(23.08,23.7,24.16,24.86,25.08,25.08,25.16,25.47,25.90)
percentileTheo = c(22.50,23.24,23.78,24.23,24.66,25.09,25.54,26.08,26.82)
Now i want to calculate the confidence intervall for alpha=0.05 for the theoretical percentiles. If i rembember myself correct, the formula is given by
error = z*sigma/sqrt(n),
value = +- error
with n=length(data) and z=quantil of the normal distribution for the given p.
So in order to get the confidence intervall for the 2nd percentile i'll do the following:
error = (qnorm(20+alpha/2,mu,sigma)-qnorm(20-alpha/2,mu,sigma))*sigma/sqrt(n)
Insert the values:
error = (qnorm(0.225,24.6609,1.6828)-qnorm(0.175,24.6609,1.6828)) * 1.6828/sqrt(11)
error = 0.152985
confidenceInterval(for 2nd percentil) = [23.24+0.152985,23.24-0.152985]
confidenceInterval(for 2nd percentil) = [23.0870,23.3929]
Finally i have
percentileTheoLower = c(...,23.0870,.....)
percentileTheoUpper = c(...,23.3929,.....)
same for the rest....
So what do you think, can i go with it?
If your goal is to test if the data follows a normal distribution, use the shapiro.wilk test:
shapiro.test(data)
# Shapiro-Wilk normality test
# data: data
# W = 0.9409, p-value = 0.5306
1-p is the probability that the distribution is non-normal. So, since p>0.05 we cannot assert that the distribution is non-normal. A crude interpretation is that "there is a 53% chance that the distribution is normal."
You can also use qqplot(...). The more nearly linear this plot is, the more likely it is that your data is normally distributed.
qqnorm(data)
Finally, there is the nortest package in R which has, among other things, the Pearson Chi-Sq test for normality:
library(nortest)
pearson.test(data)
# Pearson chi-square normality test
# data: data
# P = 3.7273, p-value = 0.2925
This (more conservative) test suggest that there is only a 29% chance that the distribution is normal. All these tests are fully explained in the documentation.