How to add regression line to boxplot per group (ggplot2)? [duplicate] - r

This question already has an answer here:
Overlay geom_line within a categorical x axis for each group - ggplot2
(1 answer)
Closed 1 year ago.
i'm a begginer at R and I can't figure out how to add regression lines to my boxplot. My code (with data) is:
dat_full<-data.frame(Fuerza = c("19.6N","19.6N","58.8N","58.8N","98,0N","98,0N", "274.4N","274.4N"),
Músculo = c("Bíceps","Tríceps","Bíceps","Tríceps","Bíceps","Tríceps","Bíceps","Tríceps"),
mV.s = c(3.5227565, -0.0897375, 7.2907255, 1.8571375, 16.327445, 8.042295, 31.15557, 12.69073),
standdev = c(0.111590642, 0.187825239, 0.886093185, 0.16351915, 3.876932131, 2.637289091, 3.713413688, 1.262850285))
dat_full<- dat_full %>%
mutate(Fuerza = factor(Fuerza, levels=c("19.6N","58.8N","98,0N","274.4N")))
dat_full
ggplot(dat_full, aes(x = as.factor(Fuerza),y=mV.s)) +
geom_boxplot(aes(lower = mV.s - standdev, upper = mV.s + standdev, middle = mV.s,
ymin = mV.s - 3*standdev, ymax = mV.s + 3*standdev), stat = "identity")+
facet_wrap(~Músculo)+
xlab("Fuerza (N)")+
theme_grey(base_size = 22)
which shows this plot
What i need to do is to add a regression line for the means (mV.s) of every condition (Fuerza) for the two groups. It it's possible, I also want to visualize R2 and the regression equation on the graph.
Thanks in advance.

You can add add a line to a ggplot using the geom_smooth() or lm() functions. Given the line you need to create, it may be easier to just make the line using lm().
lm() takes the parameters data and the two (or more) values you want to use in the regression. Here what you'd want to do is {name_of_regression} <- lm(data = dat_full, {dependent_var}~{independent_var}). I'm not sure what you want those variables to be, as Fuerza is currently populated with string values.
Also, it's been a little while since I've looked at R, so this is a somewhat verbose solution, but you can filter triceps and biceps into two datasets using the tidyverse package and then name make your regressions from each dataset.
library(tidyverse)
biceps <- filter(dat_full, dat_full$Musculo=="Biceps")
biceps_reg <- lm(data = biceps, {biceps_dep}~{bicdeps_indep})
And repeat for triceps.
Then, make the ggplot you want, and using geom_smooth() insert your lm using:
ggplot({some_code}) +
{...} +
geom_smooth(method="lm", se = FALSE)
I know that doesn't really solve your problem of wanting to put the charts together, but you can save each ggplot for biceps and triceps and then put it together using plot() once you're done.
Also, here's an R tip: in RStudio you can check any function by using something like:
?plot
?lm
Apologies for the verbosity -- I wanted to provide a quick fix here, but others may have better advice. Additionally, please let me know what your independent and dependent variables would be for the regression (they have to both be numeric here, so Fuerza won't work).

Related

Making multi-histogram in ggplot, not recognizing grouping

I'm trying to make a stack of histograms (or a ridgeplot) so I can compare distributions at certain timepoints in my observations.
I used this source for the histogram, and this for the ridge plots.
However, I cannot figure out how to set up my code to make either a stacked histogram of each length (L) by week, so that I can see L distributions at different weeks. I have tried the fill option in ggplot (which in the example seems to produce automatic color differences for the weeks because it is in the aes()?) and other "stacks" using the y= argument, but haven't had much success, I think due to the way my data is set up. If anyone can help me figure out how to make multiple histograms by week, that would be useful!
Thanks!
#fake data
L = rnorm(100, mean=10, sd=2)
t = c((rep.int(7,10)), (rep.int(14,20)), rep.int(21,30), rep.int(28,20), (rep.int(31, 20)), (rep.int(36,10)))
fake = data.frame(cbind(L,t))
#subset data into weeks for convenience
dayofweek = seq(7,120,7)
fake2 = as.data.frame(subset(fake, t %in% dayofweek))
fake2$week <- floor(fake2$t/7)
#Plots, basic code
ggplot(fake2, aes(x=L, fill=week)) +
geom_histogram()
I tried facet_grid before, but for some reason facet_wrap actually at least separated the graphs correctly, AND magically made the color fill work:
ggplot(fake2, aes(x=L, fill = week)) +
geom_histogram()+
facet_wrap(.~week)

Plot the outcome of glm()

I am an R beginner (first semester - we us this programme for univariate statistics) and currently struggling with plotting the outcome of my glm(). I read quite a few threads and help files on the internet, but I have 2 problems: 1) I don't understand the advice because it is too advanced or 2) I understand the advice but when I replicate the code, it doesn't work.
I think I am close to the solution, but my curve doesn't work how it is supposed to. Can anyone tell me what I am doing wrong?
new.data<-data.frame(x=rnorm(50,0,1), y=c("yes", "no"))
mock_model<-glm(y~x, data=new.data, family=binomial)
x1<-seq(min(new.data$x), max(new.data$x), 0.01)
y1<-predict(mock_model, list(x=x1), type="response")
plot(new.data$x, new.data$y, xlab="numeric var", ylab="binary var")
points(x1, y1)
I am new to coding and this platform, so apologies in advance if the information I have provided is not sufficient.
Any advice would be greatly appreciated.
Here's an example using mtcars and the ggplot2 package. The syntax of ggplot2 works roughly like this: You begin a plot with the ggplot() command, within which you can (but don't have to) define aesthetics (the aes() option), which include selection of axis variables, but can also contain options to change the visuals, like colors, linewidths etc. If you define the axis variables within ggplot(), don't forget to put the data assignment (see example below) outside of aes().
Afterwards, you add layers of geoms to plot specific things, like data points with geom_point(), lines with geom_line() or a lot of other fun things. When you want to use the variables and data assigned in the ggplot() command, just leave the geom empty (apart from any visual aes() options you want to use for that specific geom). However, you can define new data and variables for a geom, for example to use different data sources in the same plot.
data(mtcars)
model_shift <- glm(am ~ mpg, data = mtcars, family = 'binomial')
x <- seq(min(mtcars$mpg), max(mtcars$mpg), .1)
y <- predict(model_shift, list(mpg = x), type = 'response')
plot_data <- data.frame(mpg = x, am = y)
library(ggplot2)
ggplot(aes(x = mpg, y = am), data = plot_data) +
geom_point()
Or with a line instead of points:
ggplot(aes(x = mpg, y = am), data = plot_data) +
geom_line()
To get a glimpse of the seemingly endless possibilities of ggplot2, have a look at these 'Top 50' ggplot2 visualizations. To learn the package-specific language, see this tutorial or check your university's library for Hadley Wickham's book ggplot2: elegant graphics for data analysis.

Calculation of density estimate in density2d?

I have a more general question regarding the principle behind density2d.
I'm using ggplot and the density2d function to visualize animal movements. My idea was calculating heat maps showing where the animal is most of the time and/or to identify areas of particular interest. Yet, the density2d function sometimes generates rather inexplicable plots.
Here's what I mean:
set.seed(4)
x<-runif(50,1,599)
y<-runif(50,1,599)
df<-data.table(x,y)
ggplot(df,aes(x=x,y=y))
+stat_density2d(aes(x=x,y=y,fill=..level..,alpha=..level..),bins=50,geom="polygon")
+coord_equal(xlim=c(0,600),ylim=c(0,600))
+expand_limits(x=c(0,600),y=c(0,600))
+geom_path()
which looks like this:
There are areas with a density estimate but without data (around x:50, y:300).
Now compare with this:
set.seed(13)
x<-runif(50,1,599)
y<-runif(50,1,599)
df<-data.table(x,y)
ggplot(df,aes(x=x,y=y))
+stat_density2d(aes(x=x,y=y,fill=..level..,alpha=..level..),bins=50,geom="polygon")
+coord_equal(xlim=c(0,600),ylim=c(0,600))
+expand_limits(x=c(0,600),y=c(0,600))
+geom_path()
which looks like this:
Here there are regions "wihtout" a density estimate but with actual data (around x:100,y:550).
Someone asked a related question:
Create heatmap with distribution of attribute values in R (not density heatmap)
but there are no satisfactory answers to be found.
So my question would be (i) Why? and (ii) How to avoid/adjust if possible?
This may be helpful. I am not that familiar with stat_density2d. After seeing your code and ggplot documents (http://docs.ggplot2.org/0.9.2.1/stat_density2d.html), I thought ..level.. might not be the one. I, then, tried ..density.. Someone will be able to explain why you need density meanwhile I think this is the graph you wanted.
ggplot(data = df, aes(x = x, y = y)) +
stat_density2d(geom="tile", aes(fill = ..density..), contour = FALSE) +
geom_path() +
coord_equal(xlim=c(0,600),ylim=c(0,600)) +
expand_limits(x=c(0,600),y=c(0,600))

How to make ggplot2 run with 2 layers?

Here's my code:
tmp <- data.frame(t_year = rnorm(100,0,1),
labs = c(rep("Linear",50), rep("Spline",50)),
STUDY_PARTICIPANT_ID = rep(seq(1,50),2),
logpsa = rnorm(100,0.5,1),
mypredict = rnorm(100,1,2))
p <- ggplot(tmp) +
geom_line(aes(t_year,
mypredict,
group = as.factor(labs),
color = as.factor(labs))) +
geom_line(aes(t_year,
logpsa,
group = STUDY_PARTICIPANT_ID,
color = STUDY_PARTICIPANT_ID))
It only runs with either one of the geom_line(), but it doesn't when I tried to plot both. I was hoping it would treat them separately, but I don't think that's the case. Does anyone have any suggestion? I originally used geom_smooth() for the fitted lines, but I was unable to add a legend at the side of the ggplot. Therefore, I got the fitted values and put them in the dataset and was just going to plot them with geom_line(). All I wanted was just a label for my linear fit line and my spline. The data here doesn't show the trend, but it will give you the error messages that I was getting. Thank you for your patience with my first post.

How to make an overall boxplot alongside factors in R?

I am trying to create a boxplot that shows all of the factors of a variable, along with sample size, and at eh end of the plot also want an overall boxplot that combines all of the values into one. I am using the following line of code to do everything except making the overall plot:
library(ggplot2)
library(plyr)
xlabels <- ddply(extract8, .(Fuel), summarize, xlabels = paste(unique(Fuel), '\n(n = ', length(Fuel),')'))
ggplot(extract8, aes(x = Fuel, y = Exfiltration.Fraction.Percentage))+geom_boxplot()+
stat_boxplot(geom='errorbar', linetype=1) +
geom_boxplot(fill="pink") + geom_hline(yintercept = 0.4) +
scale_x_discrete(labels = xlabels[['xlabels']]) + ggtitle("Exfiltration Fraction (%) by Fuel Type")
Not sure on how to proceed regarding adding a boxplot that combines all of the factors into one.
This is certainly not the most elegant way to solve it, but it works:
Copy your dataset into a new object.
Within the new object, replace the content of the variable containing the factors with the label you would like, for instance, "Total".
Use rbind to attach the old and new objects together and attribute the result to the new object.
In ggplot replace the old object by the new object.
I had the same issue, couldn't find an answer and proceeded this way.

Resources