ggplot geom_boxplot and plotting last value with geom_point - r

I'm new to R. I was trying to plot the last value of each variable in a data frame on top of a boxplot. Without success I was trying:
ggplot(iris, aes(x=Species,y=Sepal.Length)) +
geom_boxplot() +
geom_point(iris, aes(x=unique(iris$Species), y=tail(iris,n=1)))
Thanks, Bill

One approach is
library(tidyverse)
iris1 <- iris %>%
group_by(Species) %>%
summarise(LastVal = last(Sepal.Length))
ggplot(iris, aes(x=Species,y=Sepal.Length)) +
geom_boxplot() +
geom_point(data = iris1, aes(x = Species, y = LastVal))

Related

Same y-axis range with ggarrange if I am already using facets and cannot use them again

my question is basically a follow-up to this question. However, the problem is that in the said question the answer completely bypasses the fact that ggarrange is used and instead transfers the whole issue to be handled by the facets functionality of ggplot.
This doesn't work for me since I already am using facets in the sub-plots and I cannot use them again.
Here is some example code. I am wondering how to achieve that the two plots which are joined with ggarrange have the same range of y-axis (of course, not setting the limits manually).
mtcars %>%
group_split(vs) %>%
map(~ggplot(., aes(x = mpg, y = wt)) +
geom_point() +
facet_grid(rows = vars(am), cols = vars(gear))) %>%
ggarrange(plotlist = .)
As you can see, the left image's y-axis ranges from 2 to 5, while the right plot's y-axis ranges from 1.5 to 3.5. How can I make them be the same?
I'm once again arguing for abandoning the 'ggarrange' approach, this time in favour of the {patchwork} package, which allows you to apply an operation to all previous plots. In this case, we can use & scale_y_continuous(limits = ...) to set the limits for all plots.
library(ggplot2)
library(dplyr)
library(purrr)
library(patchwork)
mtcars %>%
group_split(vs) %>%
map(~ggplot(., aes(x = mpg, y = wt)) +
geom_point() +
facet_grid(rows = vars(am), cols = vars(gear))) %>%
wrap_plots() &
scale_y_continuous(limits = range(mtcars$wt))
Created on 2022-12-08 by the reprex package (v2.0.0)
One option would be to compute and add the range of your x and y variables to your dataset before splitting, which could then be used to set the limits.
library(dplyr)
library(ggplot2)
library(ggpubr)
library(purrr)
mtcars %>%
mutate(across(c(mpg, wt), list(range = ~list(range(.x))))) %>%
group_split(vs) %>%
map(~ggplot(., aes(x = mpg, y = wt)) +
geom_point() +
scale_x_continuous(limits = .$mpg_range[[1]]) +
scale_y_continuous(limits = .$wt_range[[1]]) +
facet_grid(rows = vars(am), cols = vars(gear))) %>%
ggarrange(plotlist = .)

ggplot for each column in a data

I am missing some basics in R.
How do I make a plot for each column in a data frame?
I have tried making plots for each column separately. I was wondering if there was a easier way?
library(dplyr)
library(ggplot2)
data(economics)
#scatter plots
ggplot(economics,aes(x=pop,y=pce))+
geom_point()
ggplot(economics,aes(x=pop,y=psavert))+
geom_point()
ggplot(economics,aes(x=pop,y=uempmed))+
geom_point()
ggplot(economics,aes(x=pop,y=unemploy))+
geom_point()
#boxplots
ggplot(economics,aes(y=pce))+
geom_boxplot()
ggplot(economics,aes(y=pop))+
geom_boxplot()
ggplot(economics,aes(y=psavert))+
geom_boxplot()
ggplot(economics,aes(y=uempmed))+
geom_boxplot()
ggplot(economics,aes(y=unemploy))+
geom_boxplot()
All I'm looking for is having 1 box plot 2*2 and 1 2*2 scatter plot with ggplot2. I understand there is facet grid which I have failed to understand how to implement.(I believe this can be achieved easily with par(mfrow()) and base R plots. I saw somewhere else using using widening the data? which i didn't understand.
In cases like this the solution is almost always to reshape the data from wide to long format.
economics %>%
select(-date) %>%
tidyr::gather(variable, value, -pop) %>%
ggplot(aes(x = pop, y = value)) +
geom_point(size = 0.5) +
facet_wrap(~ variable, scales = "free_y")
economics %>%
tidyr::gather(variable, value, -date) %>%
ggplot(aes(y = value)) +
geom_boxplot() +
facet_wrap(~ variable, scales = "free_y")

Q: Display grouped and combined boxplot in a single plot in R

I am trying to display grouped boxplot and combined boxplot into one plot. Take the iris data for instance:
data(iris)
p1 <- ggplot(iris, aes(x=Species, y=Sepal.Length)) +
geom_boxplot()
p1
I am trying to compare overall distribution with distributions within each categories. So is there a way to display a boxplot of all samples on the left of these three grouped boxplots?
Thanks in advance.
You can rbind a new version of iris, where Species equals "All" for all rows, to iris before piping to ggplot
p1 <- iris %>%
rbind(iris %>% mutate(Species = 'All')) %>%
ggplot(aes(x = Species, y = Sepal.Length)) +
geom_boxplot()
Yes, you can just create a column for all species as follows:
iris = iris %>% mutate(all = "All Species")
p1 <- ggplot(iris) +
geom_boxplot(aes(x=Species, y=Sepal.Length)) +
geom_boxplot(aes(x=all, y=Sepal.Length))
p1

How to add x-axis ticks and labels bellow every point in a ggplot2 scatter plot

Let's say I have this plot
iris2 <- iris %>% data.table
iris2 <- iris2[Sepal.Length<6]
iris2[,Sepal.Width:=mean(Sepal.Width),by=Sepal.Length]
iris2 %>% ggplot(aes(x=Sepal.Length,y=Sepal.Width,color=Species,group=Species)) +
geom_line() + geom_point()
Which renders:
How can I add axis ticks and labels so that for every point there is a corresponding tick and label?
You need to use scale_x_continuous to achieve what you want since your x axis is not discrete. The following code should work:
iris2 %>%
ggplot(aes(x = Sepal.Length, y = Sepal.Width, color = Species, group = Species)) +
geom_line() +
geom_point() +
scale_x_continuous(breaks = unique(iris$Sepal.Length))

R ggplot: Two histograms (based on two different column) in one graph

I want to put two histograms together in one graph, but each of the histogram is based on different column. Currently I can do it like this, But the position=dodge does not work here. And there is no legend (different color for different column).
p <- ggplot(data = temp2.11)
p <- p+ geom_histogram(aes(x = diff84, y=(..count..)/sum(..count..)),
alpha=0.3, fill ="red",binwidth=2,position="dodge")
p <- p+ geom_histogram(aes(x = diff08, y=(..count..)/sum(..count..)),
alpha=0.3,, fill ="green",binwidth=2,position="dodge")
You have to format your table in long format, then use a long variable as aesthetics in ggplot. Using the iris data set as example...
data(iris)
# your method
library(ggplot2)
ggplot(data = iris) +
geom_histogram(aes(x = Sepal.Length, y=(..count..)/sum(..count..)),
alpha=0.3, fill ="red",binwidth=2,position="dodge") +
geom_histogram(aes(x = Sepal.Width, y=(..count..)/sum(..count..)),
alpha=0.3,, fill ="green",binwidth=2,position="dodge")
# long-format method
library(reshape2)
iris2 = melt(iris[,1:2])
ggplot(data = iris2) +
geom_histogram(aes(x = value, y=(..count..)/sum(..count..), fill=variable),
alpha=0.3, binwidth=2, position="identity")

Resources