When using the mammal dataset in R i am trying to fit a hierarchal model using the danger variable. However using the following code bellow i get a jagged line rather that a linear one. it seems that its not generalising very well, its fitting out data too well and i need a linear relationship here. Does anyone know how to solve this, code is bellow;
fyi body and gestation have been log transformed already here, and wont work without this
#random slope model
hie3 = lmer(dream ~ body + gestation + (1|danger), data = mammal)
summary(hie3)
mammal$hie3_predictions = predict(hie3)
hie_plot3 = ggplot(aes(x = body + gestation, y = hie3_predictions,
color = danger), data = mammal) +
geom_line(size=1) +
geom_point(aes(y = dream))
hie_plot3
Data set used can be in R as standard
Related
So I have 2 groups and an x and y variable. I am trying to run a linear regression to see if there is a significant relationship between the x and y variables within each group but I also want to look at the significance between groups. Then I would like to plot those results and provide a p-value, equation, and R^2 value on the graph. How would I go about accomplishing this?
I am able to plot the data on the same graph using this code:
ggplot(data_NeuroPsych, aes(x = Flanker_Ratio, y = Neuropsych_Delta, color = Group)) +
geom_point() +
geom_smooth(method = "lm", fill = NA)
Then using this open source code I was able to look at the results separately: https://github.com/kassambara/ggpubr/blob/master/R/stat_regline_equation.R#L7
The issue with the above is the data is not on the same plot and it does not look at the comparison between groups.
This question already has an answer here:
Overlay geom_line within a categorical x axis for each group - ggplot2
(1 answer)
Closed 1 year ago.
i'm a begginer at R and I can't figure out how to add regression lines to my boxplot. My code (with data) is:
dat_full<-data.frame(Fuerza = c("19.6N","19.6N","58.8N","58.8N","98,0N","98,0N", "274.4N","274.4N"),
Músculo = c("Bíceps","Tríceps","Bíceps","Tríceps","Bíceps","Tríceps","Bíceps","Tríceps"),
mV.s = c(3.5227565, -0.0897375, 7.2907255, 1.8571375, 16.327445, 8.042295, 31.15557, 12.69073),
standdev = c(0.111590642, 0.187825239, 0.886093185, 0.16351915, 3.876932131, 2.637289091, 3.713413688, 1.262850285))
dat_full<- dat_full %>%
mutate(Fuerza = factor(Fuerza, levels=c("19.6N","58.8N","98,0N","274.4N")))
dat_full
ggplot(dat_full, aes(x = as.factor(Fuerza),y=mV.s)) +
geom_boxplot(aes(lower = mV.s - standdev, upper = mV.s + standdev, middle = mV.s,
ymin = mV.s - 3*standdev, ymax = mV.s + 3*standdev), stat = "identity")+
facet_wrap(~Músculo)+
xlab("Fuerza (N)")+
theme_grey(base_size = 22)
which shows this plot
What i need to do is to add a regression line for the means (mV.s) of every condition (Fuerza) for the two groups. It it's possible, I also want to visualize R2 and the regression equation on the graph.
Thanks in advance.
You can add add a line to a ggplot using the geom_smooth() or lm() functions. Given the line you need to create, it may be easier to just make the line using lm().
lm() takes the parameters data and the two (or more) values you want to use in the regression. Here what you'd want to do is {name_of_regression} <- lm(data = dat_full, {dependent_var}~{independent_var}). I'm not sure what you want those variables to be, as Fuerza is currently populated with string values.
Also, it's been a little while since I've looked at R, so this is a somewhat verbose solution, but you can filter triceps and biceps into two datasets using the tidyverse package and then name make your regressions from each dataset.
library(tidyverse)
biceps <- filter(dat_full, dat_full$Musculo=="Biceps")
biceps_reg <- lm(data = biceps, {biceps_dep}~{bicdeps_indep})
And repeat for triceps.
Then, make the ggplot you want, and using geom_smooth() insert your lm using:
ggplot({some_code}) +
{...} +
geom_smooth(method="lm", se = FALSE)
I know that doesn't really solve your problem of wanting to put the charts together, but you can save each ggplot for biceps and triceps and then put it together using plot() once you're done.
Also, here's an R tip: in RStudio you can check any function by using something like:
?plot
?lm
Apologies for the verbosity -- I wanted to provide a quick fix here, but others may have better advice. Additionally, please let me know what your independent and dependent variables would be for the regression (they have to both be numeric here, so Fuerza won't work).
Here's my code:
tmp <- data.frame(t_year = rnorm(100,0,1),
labs = c(rep("Linear",50), rep("Spline",50)),
STUDY_PARTICIPANT_ID = rep(seq(1,50),2),
logpsa = rnorm(100,0.5,1),
mypredict = rnorm(100,1,2))
p <- ggplot(tmp) +
geom_line(aes(t_year,
mypredict,
group = as.factor(labs),
color = as.factor(labs))) +
geom_line(aes(t_year,
logpsa,
group = STUDY_PARTICIPANT_ID,
color = STUDY_PARTICIPANT_ID))
It only runs with either one of the geom_line(), but it doesn't when I tried to plot both. I was hoping it would treat them separately, but I don't think that's the case. Does anyone have any suggestion? I originally used geom_smooth() for the fitted lines, but I was unable to add a legend at the side of the ggplot. Therefore, I got the fitted values and put them in the dataset and was just going to plot them with geom_line(). All I wanted was just a label for my linear fit line and my spline. The data here doesn't show the trend, but it will give you the error messages that I was getting. Thank you for your patience with my first post.
I have a data set with some points in it and want to fit a line on it. I tried it with the loess function. Unfortunately I get very strange results. See the plot bellow. I expect a line that goes more through the points and over the whole plot. How can I achieve that?
How to reproduce it:
Download the dataset from https://www.dropbox.com/s/ud32tbptyvjsnp4/data.R?dl=1 (only two kb) and use this code:
load(url('https://www.dropbox.com/s/ud32tbptyvjsnp4/data.R?dl=1'))
lw1 = loess(y ~ x,data=data)
plot(y ~ x, data=data,pch=19,cex=0.1)
lines(data$y,lw1$fitted,col="blue",lwd=3)
Any help is greatly appreciated. Thanks!
You've plotted fitted values against y instead of against x. Also, you will need to order the x values before plotting a line. Try this:
lw1 <- loess(y ~ x,data=data)
plot(y ~ x, data=data,pch=19,cex=0.1)
j <- order(data$x)
lines(data$x[j],lw1$fitted[j],col="red",lwd=3)
Unfortunately the data are not available anymore, but an easier way how to fit a non-parametric line (Locally Weighted Scatterplot Smoothing or just a LOESS if you want) is to use following code:
scatter.smooth(y ~ x, span = 2/3, degree = 2)
Note that you can play with parameters span and degree to get arbitrary smoothness.
May be is to late, but you have options with ggplot (and dplyr). First if you want only plot a loess line over points, you can try:
library(ggplot2)
load(url("https://www.dropbox.com/s/ud32tbptyvjsnp4/data.R?dl=1"))
ggplot(data, aes(x, y)) +
geom_point() +
geom_smooth(method = "loess", se = FALSE)
Other way, is by predict() function using a loess fit. For instance I used dplyr functions to add predictions to new column called "loess":
library(dplyr)
data %>%
mutate(loess = predict(loess(y ~ x, data = data))) %>%
ggplot(aes(x, y)) +
geom_point(color = "grey50") +
geom_line(aes(y = loess))
Update: Added line of code to load the example data provided
Update2: Correction on geom_smoot() function name acoording #phi comment
I am trying to produce some example graphics using ggplot2, and one of the examples I picked was the birthday problem, here using code 'borrowed' from a Revolution computing presentation at Oscon.
birthday<-function(n){
ntests<-1000
pop<-1:365
anydup<-function(i){
any(duplicated(sample(pop,n,replace=TRUE)))
}
sum(sapply(seq(ntests), anydup))/ntests
}
x<-data.frame(x=rep(1:100, each=5))
x<-ddply(x, .(x), function(df) {return(data.frame(x=df$x, prob=birthday(df$x)))})
birthdayplot<-ggplot(x, aes(x, prob))+
geom_point()+geom_smooth()+
theme_bw()+
opts(title = "Probability that at least two people share a birthday in a random group")+
labs(x="Size of Group", y="Probability")
Here my graph is what I would describe as exponential, but the geom_smooth doesn't fit the data particularly well. I've tried the loess method but this didn't change things much. Can anyone suggest how to add a better smooth ?
Thanks
Paul.
The smoothing routine does not react to the sudden change for low values of x fast enough (and it has no way of knowing that the values of prob are restricted to a 0-1 range). Since you have so low variability, a quick solution is to reduce the span of values over which smoothing at each point is done. Check out the red line in this plot:
birthdayplot + geom_smooth(span=0.1, colour="red")
The problem is that the probabilities follow a logistic curve. You could fit a proper smoothing line if you change the birthday function to return the raw successes and failures instead of the probabilities.
birthday<-function(n){
ntests<-1000
pop<-1:365
anydup<-function(i){
any(duplicated(sample(pop,n,replace=TRUE)))
}
data.frame(Dups = sapply(seq(ntests), anydup) * 1, n = n)
}
x<-ddply(x, .(x),function(df) birthday(df$x))
Now, you'll have to add the points as a summary, and specify a logistic regression as the smoothing type.
ggplot(x, aes(n, Dups)) +
stat_summary(fun.y = mean, geom = "point") +
stat_smooth(method = "glm", family = binomial)