box plots with individual observations - r

very beginner question here:
I have a dataset of 4 columns of values and I need to create a graph with 4 boxplots showing average and standard deviation, and I wanted to know how to also show the individual observations as points (with ggplot2).
Thank you for your help!!!!

This is relatively simple, as you can add multiple geom_s in ggplot.
Here is a small example that showcases the geom_boxplot in combination with geom_jitter.
In order to also be able to show outliers in a box plot (if that is what you want), you can add color or different point-types with e.g. geom_boxplot(outlier.color = "red").
library(tidyverse)
iris %>%
ggplot(aes(x = Species, y = Sepal.Length)) +
geom_boxplot(outlier.colour = "red") + # Add the boxplot geom
geom_jitter(width = 0.1) # Add the points with a random jitter on the X-axis
Created on 2022-08-11 by the reprex package (v2.0.0)

Related

using a boxplot in R

I am trying to make a density plot out of 4000 rows (height.values), with 4 different categories (height.ind). this is the code i used.
library(ggplot2)
plom %>%
ggplot(aes(x = height.values, color=height.ind)) +
geom_density() +
labs(title = "height alimony")
I am able to get a density plot but there are a lot of lines instead of the 4 i want.
Anyone has an idea to fix it?

Partaly "free_y" Facet Wrap with ggplot

my goal is to produce a column graph showing different element concentrations.
there is a very wide range so I want to customise the scale of my faceted graph into 3 groups.
that way the graphs are able to show the variation in samples for each element and still be comparable between elements,
so idealy I would have 3 different scales for Groups 1,2,and 3 in the graph below.
this is the code to make the above graph
ggplot(binded)+
aes(y=mean,
x=sample,
group=id)+
geom_col(aes(fill=element))+
geom_errorbar(aes(ymin = mean - sd,
ymax = mean + sd))+
facet_wrap(rang~element)+
scale_x_continuous(breaks = seq(1,15,by=1),
name = "Sample ID")+
scale_y_continuous(name="Elemental Conc. (mg/kg)",labels = comma)+
theme(legend.position = "none")
and the data used is below
if i swich the facting to facet_wrap(rang~element,scales = "free_y") then i get
is there any way to mage the scales only free within each group of rang?
i suspect im going to have to just create 3 seperat graphs.
Thanks to Danlooo for the suggestion of patchwork that package and creating 3 separate graphs + plus another one for the y axis label proved successful.
I produced several graphs with the original code and a data frame filters for different concentrations. and the following patchwork code to produce the following graph
p5<-(p1 | p2) / p3+ plot_layout(heights=c(1,2))
(p4+p5)+plot_layout(widths = c(1, 25))

Zig Zag when using geom_line with ggplot in R

I would really appreciate some insight on the zagging when using the following code in R:
tbi_military %>%
ggplot(aes(x = year, y = diagnosed, color = service)) +
geom_line() +
facet_wrap(vars(severity))
The dataset is comprised of 5 variables (3 character, 2 numerical). Any insight would be so appreciated.
enter image description here
This is just an illustration with a standard dataset. Let's say we're interested in plotting the weight of chicks over time depending on a diet. We would attempt to plot this like so:
library(ggplot2)
ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
geom_line()
You can see the zigzag pattern appear, because per diet/time point, there are multiple observations. Because geom_line sorts the data depending on the x-axis, this shows up as a vertical line spanning the range of datapoints at that time per diet.
The data has an additional variable called 'Chick' that separates out individual chicks. Including that in the grouping resolves the zigzag pattern and every line is the weight over time per individual chick.
ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
geom_line(aes(group = interaction(Chick, Diet)))
If you don't have an extra variable that separates out individual trends, you could instead choose to summarise the data per timepoint by, for example, taking the mean at every timepoint.
ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
geom_line(stat = "summary", fun = mean)
Created on 2021-08-30 by the reprex package (v1.0.0)

Make multiple geoms animated in ggplot

I am trying to develop an animated plot showing how the rates of three point attempts and assists have changed for NBA teams over time. While the points in my plot are transitioning correctly, I tried to add a vertical and horizontal mean line, however this is staying constant for the overall averages rather than shifting year by year.
p<-ggplot(dataBREFPerPossTeams, aes(astPerPossTeam,fg3aPerPossTeam,col=ptsPerPossTeam))+
geom_point()+
scale_color_gradient(low='yellow',high='red')+
theme_classic()+
xlab("Assists Per 100 Possessions")+
ylab("Threes Attempted Per 100 Possessions")+labs(color="Points Per 100 Possessions")+
geom_hline(aes(yintercept = mean(fg3aPerPossTeam)), color='blue',linetype='dashed')+
geom_vline(aes(xintercept = mean(astPerPossTeam)), color='blue',linetype='dashed')
anim<-p+transition_time(as.integer(yearSeason))+labs(title='Year: {frame_time}')
animate(anim, nframes=300)
Ideally, the two dashed lines would shift as the years progress, however, right now they are staying constant. Any ideas on how to fix this?
I am using datasets::airquality since you have not shared your data. The idea here is that you need to have the values for your other geom (here it is mean) as a variable in your dataset, so gganimate can draw the connection between the values and frame (i.e. transition_time).
So What I did was grouping by frame (here it is month and it will be yearSeason for you) and then mutating a column with the average of my desired variables. Then in geoms I used that appended variable instead of getting the mean inside of the geom. Look below;
library(datasets) #datasets::airquality
library(ggplot2)
library(gganimate)
library(dplyr)
g <- airquality %>%
group_by(Month) %>%
mutate(mean_wind=mean(Wind),
mean_temp=mean(Temp)) %>%
ggplot()+
geom_point(aes(Wind,Temp, col= Solar.R))+
geom_hline(aes(yintercept = mean_temp), color='blue',linetype='dashed')+
geom_vline(aes(xintercept = mean_wind), color='green',linetype='dashed')+
scale_color_gradient(low='yellow',high='red')+
theme_classic()+
xlab("Wind")+
ylab("Temp")+labs(color="Solar.R")
animated_g <- g + transition_time(as.integer(Month))+labs(title='Month: {frame_time}')
animate(animated_g, nframes=18)
Created on 2019-06-09 by the reprex package (v0.3.0)

ggplot2 | How to plot mean lines for x, y, colour=group, facets=Drug~. Cannot make it look right

I have a dataset with three factors (Group=Between; Drug=Within; Session=Within) and one response variable (DEs2mPre). I am able to plot faceted boxplot using
qplot(Session, DEs2mPre, data = Dummy.Data, colour = Drug, facets=Group~., -geom="boxplot")
I have three groups and two levels of Drug, so I get nice 3X2 graph with 3 individual graphs for each group with two levels of drug over the sessions on each graph. However instead of boxplots I would like to see lines connecting the means on each session. When I change geom to geom="line", I get a mess of lines what looks like a line for every subject in the dataset and not a grouped (mean like) visualization of the data like what you would see with lineplot.CI (sciplot package).
Is there any way to do that in ggplot2?
Sorry I couldn't add my graphs because I do not have enough "reputation points".
Thanks for any help in advance.
You get a mess of lines since ggplot connects all data points by default. You need to tell ggplot to use the mean of each group instead. The appropriate arguments are stat = "summary" and fun.y = "mean".
qplot(Session, DEs2mPre, data = Dummy.Data, colour = Drug, facets = Group~.,
stat = "summary", fun.y = "mean", geom = "line")

Resources