Remove certain datapoints from boxplot in ggplot2 - r

I have a box-plot of certain variables. I see the box plots of a number of days on x-axis (Sunday, Monday, Wednesday, Thursday and Friday). I would like to know how to remove two of the box plots from the graph. To be specific, I don't want there to be a box plot of Monday and Wednesday. The code I used was:
ggplot(contents2019, aes(x = days, y = time)) +
geom_boxplot() +
xlab("Day") +
ylab("Time") +
theme_bw()

You can filter your dataframe beforehand:
library(tidyverse)
contents2019 %>%
filter(!days %in% c("Monday", "Wednesday")) %>%
ggplot(aes(x = days, y = time)) +
geom_boxplot() +
xlab("Day") +
ylab("Time") +
theme_bw()

Related

Plot graph with x-axis showing weeks and year

I have a dataframe that looks like this
dat <- data.frame(
weeks = c(23,24,25,26,23)
year = c(2022,2022,2022,2023,2023),
cases = c(70,98,69,430,56)
)
Now I am trying to plot this data using ggplot2
ggplot(data=dat, aes(x=factor(weeks), y=cases)) +
geom_bar(stat="identity")
I did like to have the x-axis split showing weeks in 2022 and weeks in 2023.
something like this
How can I do this?
I have figured it out, using facet_grid then place the group outside using stri.placement
ggplot(data=dat, aes(x=factor(weeks), y=cases)) +
geom_bar(stat="identity")+
facet_grid(. ~ year, scales = "free", switch = "x", space = "free_x") +
theme(strip.placement = "outside")

How to change x-axis ticks to reflect another variable?

I'm working with a data frame that includes both the day and month of the observation:
Day Month
1 January
2 January
3 January
...
32 February
33 February
34 February
...
60 March
61 March
And so on. I am creating a line graph in ggplot that reflects the day by day value of column wp (which can just be assumed to be random values between 0 and 1).
Because there are so many days in the data set, I don't want them to be reflected in the x-axis tick marks. I need to refer to the day column in creating the plot so I can see the day-by-day change in wp, but I just want month to be shown on the x-axis labels. I can easily rename the x-axis title with xlab("Month"), but I can't figure out how to change the tick marks to only show the month. So ideally, I want to see "January", "February", and "March" as the labels along the x-axis.
test <- ggplot(data = df, aes(x=day, y=wp)) +
geom_line(color="#D4BF91", size = 1) +
geom_point(color="#D4BF91", size = .5) +
ggtitle("testex") +
xlab("Day") +
ylab("Value") +
theme_fivethirtyeight() + theme(axis.title = element_text())
When I treat the day as an actual date, I get the desired result:
library(lubridate)
set.seed(21540)
dat <- tibble(
date = seq(mdy("01-01-2020"), mdy("01-01-2020")+75, by=1),
day = seq_along(date),
wp = runif(length(day), 0, 1)
)
ggplot(data = dat, aes(x=date, y=wp)) +
geom_line(color="#D4BF91", size = 1) +
geom_point(color="#D4BF91", size = .5) +
ggtitle("testex") +
xlab("Day") +
ylab("Value") +
theme(axis.title = element_text())
More generally, though you could use scale_x_continuous() to do this:
library(lubridate)
set.seed(21540)
dat <- tibble(
date = seq(mdy("01-01-2020"), mdy("01-01-2020")+75, by=1),
day = seq_along(date),
wp = runif(length(day), 0, 1),
month = as.character(month(date, label=TRUE))
)
firsts <- dat %>%
group_by(month) %>%
slice(1)
ggplot(data = dat, aes(x=day, y=wp)) +
geom_line(color="#D4BF91", size = 1) +
geom_point(color="#D4BF91", size = .5) +
ggtitle("testex") +
xlab("Day") +
ylab("Value") +
scale_x_continuous(breaks=firsts$day, label=firsts$month) +
theme(axis.title = element_text())

position_dodge2 does not work for multiple values in a dataframe

I want to create a box-plot with dodge as a position adjustment. The problem is that "position_dodge2" is not working when I use two layers of geom_boxplot with different data sources. An "obvious" solution is to merge both dataframes and then use ggplot with an unique dataframe, but I do not know how to organize the data to make it work.
The data is composed of precipitation estimation from different sources for the same location for 6 months.
Date Prec1 Prec2 ...
01-01-2000 0.2 0.8 ...
.
.
.
Then, I used the "gather" function to create the variable precipitation in one column. Also include the month and week columns
Precipitation Date Value Month Week
Prec1 01-01-2000 0.2 Jan 1
Prec2 01-01-2000 0.6 Jan 1
.
.
.
This is the code I have:
p1 <- ggplot(data = df, aes(x=Date, y=Value, group = Week),
position = position_dodge2(width = 0.5)) +
geom_boxplot(data = subset(df, Model != "PrecF1")) +
geom_boxplot(data = subset(df, Model == "PrecF1"),
color = "red",
width = 0.32) +
xlab("Date") + ylab("Precipitation (mm)") +
scale_x_date(date_breaks = "1 month", date_labels = "%B") # adjust the x axis breaks
p1 + theme(axis.text.x = element_text(size=8, angle=0))
I want to compare "Prec1" values against the other estimations using boxplots side-by-side but position_dodge2 does not work :(
I appreciate any help. Please let me know if I am overlooking something obvious or I should provide the data. The output is attached.
The data is here: PrecData
Edited in the light of subsequent comments...
The problem is that you are using a continuous x axis for what is essentially a discrete set of data points, so dodging does not really make sense (as it changes the apparent date value). You can solve this by basing the x axis on as.factor(week), and fiddling the labels appropriately. Try the following...
ggplot(data = df, aes(x=as.factor(week), y=value, colour = (Model=="PrecF1"))) +
geom_boxplot()
An alternative would be to use facets. Try the following...
ggplot(data = df, aes(x=Date, y=Value, group = Week)) +
geom_boxplot() +
facet_wrap(~(Model == "PrecF1")) +
xlab("Date") + ylab("Precipitation (mm)") +
scale_x_date(date_breaks = "1 month", date_labels = "%B") +
theme(axis.text.x = element_text(size=8, angle=0))
This will give you two facets side by side, depending on whether Model == "PrecF1" is true or false.

How to plot monthly sales data correctly?

I'm trying to plot the monthly sales data with RStudio, but the dates on the x-axis are not showing correctly.
My code :-
uc_ts_plot <- ggplot(monthly_sales, aes(DATE,DAUTONSA)) + geom_line(na.rm=TRUE) +
xlab("Month") + ylab("Auto Sales in Thousands") +
scale_x_date(labels = date_format(format= "%b-%Y"),
breaks = date_breaks("1 year")) +
stat_smooth(colour = "green")
uc_ts_plot
I expect the dates on the x-axis to be displayed as Jan-2011, Jan-2012, as shown here.
All I'm getting is a 0001-01 at the left end and a 0002-01 at the right end of the x-axis.
The plot which is shown is filtered between year 2011 and 2018 whereas the data you have is from 1967.
The below code produces the exact plot
library(tidyverse)
library(scales)
library(lubridate)
monthly_sales %>%
mutate(DATE = as.Date(DATE)) %>%
filter(year(DATE) >= 2011 & year(DATE) < 2018) %>%
ggplot() + aes(DATE,DAUTONSA) +
geom_line(na.rm=TRUE) +
xlab("Month") + ylab("Auto Sales in Thousands") +
scale_x_date(labels = date_format(format= "%b-%Y"),
breaks = date_breaks("1 year")) +
stat_smooth(colour = "green")
You can remove the filter step to plot data for all the years but then it clutters the x-axis with lot of labels.

geom_area ggplot fill above threshold with data subset

I read several posts on how to fill above geom_area plots using the geom_ribbon function, but none have also dealt with subsets of data.
Consider the following data and plot. I simply want to fill above a threshold, 25 in this example (y-axis), but also fill only within a subset of days within each month, in this case between days 2 and 12. In sum, both criteria must be met in order to fill, and I'm trying to get a smooth fill.
I can improve upon my graph below by using the approx function to interpolate a lot of points on my line, but it still does not handle my subset and connects fill lines between months.
library(ggplot2)
y = sample(1:50)
x = seq(as.Date("2011-12-30"), as.Date("2012-02-17"), by="days", origin="1970-01-01")
z = format(as.Date(x), "%d")
z=as.numeric(z)
df <- data.frame(x,y,z)
plot<-ggplot(df, aes(x=x, y=y)) +
geom_area(fill="transparent") +
geom_ribbon(data=subset(df, z>=2 & z<=12), aes(ymin=25, ymax=ifelse(y< 25,20, y)), fill = "red", alpha=0.5) +
geom_line() +
geom_hline(yintercept = 25, linetype="dashed") +
labs(y="My data") +
theme_bw(base_size = 22)
plot
Figure
The data=subset(df, z>=2 & z<=12) removes lines from the dataframe, so the data is 'lost' for geom_ribbon.
Instead of subsetting an additional condition for the y-value may get you closer to what you want to achieve:
plot<-ggplot(df, aes(x=x, y=y)) +
geom_area(fill="transparent") +
geom_ribbon(data=df, aes(ymin=25, ymax=ifelse((z>=2 & z<=12), ifelse(y < 25, 20, y), 25)), fill = "red", alpha=0.5) +
geom_line() +
geom_hline(yintercept = 25, linetype="dashed") +
labs(y="My data") +
theme_bw(base_size = 22)

Resources