I have a data-set which has 3 columns: date, amount, and a factor/cluster. For example:
date;amount;cluster_id
02.10.10;-13,86;3
04.10.10;-66,28;3
06.10.10;-14,99;3
25.10.10;-20,96;3
30.10.10;-408,99;3
31.01.11;-29,5;2
07.02.11;-652,85;3
19.09.11;-277,48;3
30.09.11;-6,18;3
03.10.11;-242,47;3
04.11.11;-299,77;3
20.02.12;-367,85;3
03.10.12;-4,99;4
13.09.13;-6,59;4
14.10.13;-1043,46;3
24.10.13;-373,99;3
24.10.13;-1321,91;3
18.12.13;-24,45;4
03.02.14;-66,87;3
30.08.14;-7,6;2
28.10.14;-115;3
13.12.14;-8,99;3
15.12.14;-352,44;3
19.12.14;115;3
08.07.15;-59;2
The following code:
ggplot(data, aes(x=date, y=amount, colour=factor(mycluster))) +
stat_smooth(method = "rlm", formula = y ~ x)
simply performs a rlm per group/factor. And looks like:
How can I combine each separate regression model into one big (added) model in order to plot one "combined" model in an easy way e.g. without looping over all the rlm models manually.
Related
So I have 2 groups and an x and y variable. I am trying to run a linear regression to see if there is a significant relationship between the x and y variables within each group but I also want to look at the significance between groups. Then I would like to plot those results and provide a p-value, equation, and R^2 value on the graph. How would I go about accomplishing this?
I am able to plot the data on the same graph using this code:
ggplot(data_NeuroPsych, aes(x = Flanker_Ratio, y = Neuropsych_Delta, color = Group)) +
geom_point() +
geom_smooth(method = "lm", fill = NA)
Then using this open source code I was able to look at the results separately: https://github.com/kassambara/ggpubr/blob/master/R/stat_regline_equation.R#L7
The issue with the above is the data is not on the same plot and it does not look at the comparison between groups.
I have a model which has been created like this
cube_model <- lm(y ~ x + I(x^2) + I(x^3), data = d.r.data)
I have been using ggplot methods like geom_point to plot datapoints and geom_smooth to plot the regression line. Now the question i am trying to solve is to plot fitted data vs observed .. How would i do that? I think i am just unfamiliar with R so not sure what to use here.
--
EDIT
I ended up doing this
predicted <- predict(cube_model)
ggplot() + geom_point(aes(x, y)) + geom_line(aes(x, predicted))
Is this correct approach?
What you need to do is use the predict function to generate the fitted values. You can then add them back to your data.
d.r.data$fit <- predict(cube_model)
If you want to plot the predicted values vs the actual values, you can use something like the following.
library(ggplot2)
ggplot(d.r.data) +
geom_point(aes(x = fit, y = y))
I calculated a linear-mixed model using the nlme package. I was evaluating a psychological treatment and used treatment condition and measurement point as predictors. I did post-hoc comparisons using the emmans package. So far so good, everything worked out well and I am looking forward to finish my thesis. There is only one problem left. I am really really bad in plotting. I want to plot the emmeans for the four measurement points for each group. The emmip function in emmeans does this, but I am not that happy with the result. I used the following code to generate the result:
emmip(HLM_IPANAT_pos, Gruppe~TP, CIs=TRUE) + theme_bw() + labs(x = "Zeit", y = "IPANAT-PA")
I don't like the way the confidence intervals are presented. I would prefer a line bar with "normal" confidence bars, like the one below, which is taken from Ireland et al. (2017). I tried to do it in excel, but did not find out how to integrate seperate confidence intervals for each line. So I was wondering if there was the possibility to do it using ggplot2. However, I do not know how to integrate the values I obtained using emmeans in ggplot. As I said, I really have no idea about plotting. Does someone know how to do it?
I think it is possible. Rather than using emmip to create the plot, you could use emmeans to get the values for ggplot2. With ggplot2 and the data, you might be able to better control the format of the plot. Since I do not have your data, I can only suggest a few steps.
First, after fitting the model HLM_IPANAT_pos, get values using emmeans. Second, broom::tidy this object. Third, ggplot the above broom::tidy object.
Using mtcars data as an example:
library(emmeans)
# mtcars data
mtcars$cyl = as.factor(mtcars$cyl)
# Model
mymodel <- lm(mpg ~ cyl * am, data = mtcars)
# using ggplot2
library(tidyverse)
broom::tidy(emmeans(mymodel, ~ am | cyl)) %>%
mutate(cyl_x = as.numeric(as.character(cyl)) + 0.1*am) %>%
ggplot(aes(x = cyl_x, y = estimate, color = as.factor(am))) +
geom_point() +
geom_line() +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.1)
Created on 2019-12-29 by the reprex package (v0.3.0)
Goal: I want to obtain regression (ggplot curves and model parameters) for growth curves with multiple treatments.
I have data for bacterial cultures C={a,b,c,d} growing on nutrient sources N={x,y}.
Their idealized growth curves (measuring turbidity of cell culture every hour) look something like this:
There are 8 different curves to obtain coefficients and curves for. How can I do it in one go for my data frame, feeding the different treatments as different groups for the nonlinear regression?
Thanks!!!
This question is similar to an unanswered question posted here.
(sourcecode for idealized data, sorry it's not elegant as I'm not a computer scientist):
a<-1:20
a[1]<-0.01
for(i in c(1:19)){
a[i+1]<-1.3*a[i]*(1-a[i])
}
b<-1:20
b[1]<-0.01
for(i in c(1:19)){
b[i+1]<-1.4*b[i]*(1-b[i])
}
c<-1:20
c[1]<-0.01
for(i in c(1:19)){
c[i+1]<-1.5*c[i]*(1-c[i])
}
d<-1:20
d[1]<-0.01
for(i in c(1:19)){
d[i+1]<-1.6*d[i]*(1-d[i])
}
sub.data<-cbind(a,b,c,d)
require(reshape2)
data<-melt(sub.data, value.name = "OD600")
data$nutrition<-rep(c("x", "y"), each=5, times=4)
colnames(data)[1:2]<-c("Time", "Culture")
ggplot(data, aes(x = Time, y = OD600, color = Culture, group=nutrition)) +
theme_bw() + xlab("Time/hr") + ylab("OD600") +
geom_point() + facet_wrap(~nutrition, scales = "free")
If you are familiar group_by function from dplyr (included in tidyverse), then you can group your data by Culture and nutrition and create models for each group using broom. I think this vignette is getting at exactly what you are trying to accomplish. Here is the code all in one go:
library(tidyverse)
library(broom)
library(mgcv) #For the gam model
data %>%
group_by(Culture, nutrition) %>%
do(fit = gam(OD600 ~ s(Time), data = ., family=gaussian())) %>% # Change this to whatever model you want (e.g., non-linear regession, sigmoid)
#do(fit = lm(OD600 ~ Time, data = .,)) %>% # Example using linear regression
augment(fit) %>%
ggplot(aes(x = Time, y = OD600, color = Culture)) + # No need to group by nutrition because that is broken out in the facet_wrap
theme_bw() + xlab("Time/hr") + ylab("OD600") +
geom_point() + facet_wrap(~nutrition, scales = "free") +
geom_line(aes(y = .fitted, group = Culture))
If you are ok without one go, break apart the %>% for better understanding. I used GAM which overfits here but you could replace this with whatever model you want, including sigmoid.
I'm doing graphics with lm relation, and I want to archive and to plot for each one of them an equation y=ax+b with R². How can I do it?
lmfit <- geom_smooth(method="lm", se = T)
p <- qplot(x, y, data=Tab) + facet_grid(id ~., scales = "free") + lmfit
Within ggplot, there is no direct way to do this. You need to compute the regressions separately for each id and then extract the equation and R^2 from each of those. Put those extracted versions in a dataframe (along with id) and use geom_text to display them.