How do I fix the abline warning, only using first two coefficients? - r

I have been unable to resolve an error when using abline(). I keep getting the warning message: In abline(model): only using the first two of 7 regression coefficients. I've been searching and seen many instances of others with this error but their examples are for multiple linear functions. I'm new to R and below is a simple example I'm using to work with. Thanks for any help!
year = c('2010','2011','2012','2013','2014','2015','2016')
population = c(25244310,25646389,26071655,26473525,26944751,27429639,27862596)
Texas=data.frame(year,population)
plot(population~year,data=Texas)
model = lm(population~year,data=Texas)
abline(model)

You probably want something like the following where we make sure that year is interpreted as a numeric variable in your model:
plot(population ~ year, data = Texas)
model <- lm(population ~ as.numeric(as.character(year)), data = Texas)
abline(model)
This makes lm to estimate an intercept (corresponding to a year 0) and slope (the mean increase in population each year), which is correctly interpreted by abline as can also be seen on the plot.
The reason for the warning is that year becomes a factor with 7 levels and so your lm call estimate the mean value for the refence year 2010 (the intercept) and 6 contrasts to the other years. Hence you get many coefficients and abline only uses the first two incorrectly.
Edit: With that said, you probably want change the way year is stored to a numeric. Then your code works, and plot also makes a proper scatter plot as regression line.
Texas$year <- as.numeric(as.character(Texas$year))
plot(population ~ year, data = Texas, pch = 16)
model <- lm(population ~ year, data = Texas)
abline(model)
Note that the as.character is needed in general, but it works in lm without it by coincidence (because the years are consecutive)

Related

How to extract a single Y variable value using a X variable without using a plot

me again!
In one of my assignment I have to create a plot with a regression line in and simply read this plot and give data.
Question: "at 80 degrees F what is the wind-speed?"
By simply looking at the plot you can state its ~9m/s at 80F. This would suffice, but knowing what you can do in R i would like to know for ether future reference and now.
How does one using only the Data frame ( in picture ) extract a Y value for a given X value using linear regression
Linear regression because the value itself isn't given, but it can be extracted if you assume its a linear regression.
So in essence instead of reading out the value in the plot ( pic 2 ) I would like to use a function that given a X(temp) value in the DF prints out a Y(wind) value using linear regression.
I tried other stuff i found on stackoverflow, using
lm(data~data, dataframe) but that didnt give me the result i desired.
You might look for the predict function.
First make a linear regression and then calculate the predicted value with predict. Just keep in mind, that you add your X-value in a data.frame.
datasets::airquality
lm_air <- lm(Wind ~ Temp, airquality)
predict(lm_air, data.frame(Temp = 80))

R Cross Validation lm predict function [duplicate]

I am trying to convert Absorbance (Abs) values to Concentration (ng/mL), based on an established linear model & standard curve. I planned to do this by using the predict() function. I am having trouble getting predict() to return the desired results. Here is a sample of my code:
Standards<-data.frame(ng_mL=c(0,0.4,1,4),
Abs550nm=c(1.7535,1.5896,1.4285,0.9362))
LM.2<-lm(log(Standards[['Abs550nm']])~Standards[['ng_mL']])
Abs<-c(1.7812,1.7309,1.3537,1.6757,1.7409,1.7875,1.7533,1.8169,1.753,1.6721,1.7036,1.6707,
0.3903,0.3362,0.2886,0.281,0.3596,0.4122,0.218,0.2331,1.3292,1.2734)
predict(object=LM.2,
newdata=data.frame(Concentration=Abs[1]))#using Abs[1] as an example, but I eventually want predictions for all values in Abs
Running that last lines gives this output:
> predict(object=LM.2,
+ newdata=data.frame(Concentration=Abs[1]))
1 2 3 4
0.5338437 0.4731341 0.3820697 -0.0732525
Warning message:
'newdata' had 1 row but variables found have 4 rows
This does not seem to be the output I want. I am trying to get a single predicted Concentration value for each Absorbance (Abs) entry. It would be nice to be able to predict all of the entries at once and add them to an existing data frame, but I can't even get it to give me a single value correctly. I've read many threads on here, webpages found on Google, and all of the help files, and for the life of me I cannot understand what is going on with this function. Any help would be appreciated, thanks.
You must have a variable in newdata that has the same name as that used in the model formula used to fit the model initially.
You have two errors:
You don't use a variable in newdata with the same name as the covariate used to fit the model, and
You make the problem much more difficult to resolve because you abuse the formula interface.
Don't fit your model like this:
mod <- lm(log(Standards[['Abs550nm']])~Standards[['ng_mL']])
fit your model like this
mod <- lm(log(Abs550nm) ~ ng_mL, data = standards)
Isn't that some much more readable?
To predict you would need a data frame with a variable ng_mL:
predict(mod, newdata = data.frame(ng_mL = c(0.5, 1.2)))
Now you may have a third error. You appear to be trying to predict with new values of Absorbance, but the way you fitted the model, Absorbance is the response variable. You would need to supply new values for ng_mL.
The behaviour you are seeing is what happens when R can't find a correctly-named variable in newdata; it returns the fitted values from the model or the predictions at the observed data.
This makes me think you have the formula back to front. Did you mean:
mod2 <- lm(ng_mL ~ log(Abs550nm), data = standards)
?? In which case, you'd need
predict(mod2, newdata = data.frame(Abs550nm = c(1.7812,1.7309)))
say. Note you don't need to include the log() bit in the name. R recognises that as a function and applies to the variable Abs550nm for you.
If the model really is log(Abs550nm) ~ ng_mL and you want to find values of ng_mL for new values of Abs550nm you'll need to invert the fitted model in some way.

Plot linear and multiple linear reg on the same graph (ggplot)

I have for instance this data frame :
data <- data.frame(
x=c(1:12)
, case=c(3,5,1,8,2,4,5,0,8,2,3,5)
, rain=c(1,8,2,1,4,5,3,0,8,2,3,4)
, country=c("A","A","A","A","B","B","B","B","C","C","C","C")
, year=rep(seq(2000,2003,1),3)
)
I would like to perform 2 linear regressions and plot them on one graph.
In a nutshell, I would like to compare the crude trend of cases over time (simple lm) with the same trend of cases but this time adjusted to rainfall over the years 2000 to 2003, on one and same graph.
model<-lm(case~ year, data=data)
the second one would be a multiple linear regression. I used this code for the purpose, but not sure it is ideal.
modelrain<-lm(case~ I(year +rain), data=data)
I did it with a simple plot with abline, but don't know how to make it with ggplot. I've created a new dataframe, but doesn't seem to work perfectly ( so I don't put the rest of my code here).
Thank you very much
Building off the suggestions in the comments there are three valid regression models
model1<-lm(case~ year, data=data)
summary(model1)
model2<-lm(case~ year+rain, data=data)
summary(model2)
model3<-lm(case~ year*rain, data=data)
summary(model3)
With the limited data we have doesn't seem to be a lot going on.
The first question on how to plot the regression line for model1 using ggplot is just:
ggplot(data,aes(x=year,y=case)) + geom_point() + geom_lm()
As others have noted it is unclear what user3355655 means by "adjusted" for rain (since rain and year can't truly exist on the same x axis) but if we're willing to take the simplest course and simply treat rain as a "factor" then:
ggplot(data,aes(x=year,y=case,color=factor(rain))) + geom_point() + geom_smooth(method="lm",fill=NA) + scale_y_continuous(limits = c(-1, 10))

PRC analysis with paired observations in vegan

This message is a copy from a message that I wrote in R-Forge. I would like to compute Principal response curve analysis on my data. I have several pairs of plots where deer browse the vegetation on Anticosti island, Québec. There are repeated observations of each plot during the course of 4 years. At each site, there is a plot inside the enclosure (without deer, called "exclosure") and the other plot is outside the enclosure (with deer, called "control"). I would like to take into account the pairing of observations in and out of each enclosure in the PRC analysis. I would like to add an other condition term to the PRC (like in partial RDA) to consider the paired observations or extract value from a partial RDA computed with the PRC formula and plot it like it is done in a PRC.
More over, I would like to test with permutations tests the signification of the difference between the two treatments. My hypothesis is to find if vegetation composition is different in the exclosure than in the control throughout the years. So, I would like to know if there is a difference between the two treatments and if there is, after how many years.
Somebody knows how to do this?
So here the code of my prc (without taking paired observations into account):
levels (treat)
[1] "controle" "exclosure"
levels (years)
[1] "0" "3" "5" "8"
prc.out <- prc(data.prc.spe.hell, treat, years)
species <- colSums(data.prc.spe.hell)
plot(prc.out, select = species > 5)
ctrl <- how(plots = Plots(strata = site,type = "free"),
within = Within(type = "series"), nperm = 99)
anova(prc.out, permutations = ctrl, first=TRUE)
Here is the result.
Thank you very much for your help!
I may have an answer for the first part of your question:"I would like to add an other condition term to the PRC (like in partial RDA) to consider the paired observations".
I am currently working on a similar case and this is what I came up with: Since Principal Responses Curves (PRC) are a special case of RDA, and that the objective is to do a kind of "partial PRC", I read the R documentation of the function rda() and this is what I found: "If matrix Z is supplied, its effects are removed from the community matrix, and the residual matrix is submitted to the next stage."
So if I understand well, when you do a partial RDA with X, Y, Z (X=community matrix, Y=Constraining matrix, Z=Conditioning matrix), the first thing done by the function is to remove the effect of Z by using the residuals matrix of the RDA of X ~ Z.
If that is true, it is easy to do this step alone, and then to use the residual matrix in your PRC:
library(vegan)
rda.out = rda(X ~ Z) # equivalent of "rda.out = rda(X ~ Condition(Z))"
rda.res = residuals(rda.out)
prc.out = prc(rda.res, treatment, time)
If you coded a dummy variable for your pairing effect, I think it should be as.factor() and NOT as.numeric().
I am not a stats expert, but it looks right to me. Even though that look simple, I would appreciate if someone could validate my answer.
Cheers

Using ggsurv function in R version 3.1.0

I am trying to create a survival plot in R for deaths from exposure to a fungal disease over a number of weeks. I have the death week(continuous), whether they were alive (TRUE/FALSE), as well as categorical variables for diet (high/low) and sex(male/female). I have run a coxph model:
surv1 <- coxph(Surv(week_died,alive) ~ exposed + diet + sex,
data=surv)
I would like to plot a survival curve, with individual lines for exposed males on high and low diets, and the same for females on high and low diets (resulting in 4 individual survival curves on the same plot). if I use this then I only get a single curve.
plot(survfit(surv1), ylim=c(), xlab="Weeks")
I have also tried to use the ggsurv function created by Edwin Thoen (http://www.r-statistics.com/2013/07/creating-good-looking-survival-curves-the-ggsurv-function/) but keep getting an error for "invalid line type". I have tried to work out what could be causing this and think it would be this last ifelse statement - but I am not sure.
pl <- if(strata == 1) {ggsurv.s(s, CI , plot.cens, surv.col ,
cens.col, lty.est, lty.ci,
cens.shape, back.white, xlab,
ylab, main)
} else {ggsurv.m(s, CI, plot.cens, surv.col ,
cens.col, lty.est, lty.ci,
cens.shape, back.white, xlab,
ylab, main)}
Does anyone have any idea on what is causing this error/how to fix it or if I completely trying to do the wrong thing to plot these curves.
Many thanks!
survfit on a coxph model without any other specifications gives the survival curve for a case whose covariate predictors are the average of the population that the model was created with. From the help for survfit.coxph
Serious thought has been given to removing the default value for newdata, which is to use a single "psuedo" subject with covariate values equal to the means of the data set, since the resulting curve(s) almost never make sense. ... Two particularly egregious examples are factor variables and interactions. Suppose one were studying interspecies transmission of a virus, and the data set has a factor variable with levels ("pig", "chicken") and about equal numbers of observations for each. The “mean” covariate level will be 1/2 – is this a flying pig? ... Users are strongly advised to use the newdata argument.
So after you have computed surv1,
sf <- survfit(surv1,
newdata = expand.grid(diet = unique(surv$diet),
sex = unique(surv$sex)))
plot(sf)
sf should also work as an argument to ggsurv, though I've not tested it.

Resources