ggplot plot - reorder variable & alter line thickness/colour - r

This code creates a basic plot but I can't work out how to order the values in order of value (fct_reorder is included but I must have done something wrong). I also wanted to colour the lines and make them thicker.
library(tidyverse)
dat2 <- tibble(Percentage = c(12.5,58.9,9.1,3.6,7.3,7.3),
ICDDx = c("Dx1","Dx2","Dx3","Dx4","Dx5","Dx6"))
library(ggplot2)
ggplot(dat2, aes(Percentage,ICDDx, fct_reorder(Percentage))) +
geom_segment(aes(x = 0, y = ICDDx, xend = Percentage,
yend = ICDDx), color = "grey50") +
geom_point(size=6)
I tried to specify geom_line(size = 3), but received this error:
Error: `data` must be a data frame, or other object coercible by
`fortify()`, not an S3 object with class LayerInstance/Layer/ggproto/gg

Just use geom_lollipop():
library(tidyverse)
dat2 <- tibble(Percentage = c(12.5,58.9,9.1,3.6,7.3,7.3),
ICDDx = c("Dx1","Dx2","Dx3","Dx4","Dx5","Dx6"))
mutate(dat2, ICDDx = fct_reorder(ICDDx, Percentage)) %>%
mutate(Percentage = Percentage/100) %>%
ggplot() +
ggalt::geom_lollipop(
aes(Percentage, ICDDx), horizontal=TRUE,
colour = "#6a3d9a", size = 2,
point.colour = "#ff7f00", point.size = 4
) +
hrbrthemes::scale_x_percent(
expand=c(0,0.01), position = "top", limits = c(0,0.6)
) +
labs(
x = NULL, y = NULL
) +
hrbrthemes::theme_ipsum_rc(grid="X")

Here is my answer based on my interpretation of your question.
dat2 %>%
arrange(Percentage) %>%
ggplot(aes(Percentage,ICDDx,col=ICDDx,size=4))+
geom_segment(aes(x = 0, y = ICDDx, xend = Percentage, yend = ICDDx))+
geom_point(size=6)
That gives the following plot:

You could do a ranking first.
dat2 <- dat2[order(dat2$Percentage), ] # order by percentage
dat2$rank <- 1:nrow(dat2) # add ranking variable
ggplot(dat2, aes(x=Percentage, y=rank, group=rank, color=ICDDx)) +
geom_segment(aes(x=0, y=rank, xend=Percentage,
yend=rank), col="grey50", size=2) +
geom_point(size=6) +
scale_y_continuous(breaks=1:length(dat2$ICDDx), labels=dat2$ICDDx) + # optional
scale_color_discrete(labels=dat2$ICDDx)
Yielding

Related

Define custom transformation of ggplot axis labels with trans_new function

I am working on percentage changes between periods and struggling with logaritmic transformation of labels. Here is an example based on the storms dataset:
library(dplyr)
library(ggplot2)
library(scales)
df <- storms |>
group_by(year) |>
summarise(wind = mean(wind)) |>
mutate(lag = lag(wind, n = 1)) |>
mutate(perc = (wind / lag) - 1) |>
tidyr::drop_na()
I want to visualize the distribution of percentages, making the percentage change symmetrical (log difference) with log1p.
ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5)
x-axis with log1p values
At this point I wanted to transform the x-axis label back to the original percentage value.
I tried to create my own transformation with trans_new, and applied it to the labels in scale_x_continuous, but I can't make it work.
trans_perc <- trans_new(
name = "trans_perc",
transform = log1p_trans(),
inverse = function(x)
expm1(x),
breaks = breaks_log(),
format = percent_format(),
domain = c(-Inf, Inf)
)
ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5) +
scale_x_continuous(labels = trans_perc)
Currently, the result is:
Error in get_labels():
! breaks and labels are different lengths
Run rlang::last_error() to see where the error occurred.
Thanks!
EDIT
I am adding details on the different output I am getting from Alan's first answer:
trans_perc <- trans_new(
name = "trans_perc",
transform = log1p,
inverse = expm1,
breaks = pretty_breaks(5),
format = percent_format(),
domain = c(-Inf, Inf)
)
library(ggpubr)
a <- ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5)
b <- ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5) +
scale_x_continuous(trans = trans_perc)
c <- ggplot(df, aes(x = perc)) +
geom_histogram(bins = 5) +
scale_x_continuous(trans = trans_perc)
ggarrange(a, b, c,
ncol = 3,
labels = c("Log on Value only",
"Log on Value and X",
"Log on X only"))
[different outcomes]:(https://i.stack.imgur.com/dCW2m.png
If I understand you correctly, you want to keep the shape of the histogram, but change the labels so that they reflect the value of the perc column rather the transformed log1p(perc) value. If that is the case, there is no need for a transformer object. You can simply put the reverse transformation (plus formatting) as a function into the labels argument of scale_x_continuous.
ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5) +
scale_x_continuous("Percentage Change",
breaks = log1p(pretty(df$perc, 5)),
labels = ~ percent(expm1(.x)))
Note that although the histogram remains symmetrical in shape, the axis labels represent the back-transformed values of the original axis labels.
The point of a transformer object is to do all this for you without having to pass a transformed data set (i.e. without having to pass log1p(perc)). So in your case, you could do:
trans_perc <- trans_new(
name = "trans_perc",
transform = log1p,
inverse = expm1,
format = percent_format(),
domain = c(-Inf, Inf)
)
ggplot(df, aes(x = perc)) +
geom_histogram(bins = 5) +
scale_x_continuous(trans = trans_perc)
Which gives essentially the same result

How do I add data labels to a ggplot histogram with a log(x) axis?

I am wondering how to add data labels to a ggplot showing the true value of the data points when the x-axis is in log scale.
I have this data:
date <- c("4/3/2021", "4/7/2021","4/10/2021","4/12/2021","4/13/2021","4/13/2021")
amount <- c(105.00, 96.32, 89.00, 80.84, 121.82, 159.38)
address <- c("A","B","C","D","E","F")
df <- data.frame(date, amount, address)
And I plot it in ggplot2:
plot <- ggplot(df, aes(x = log(amount))) +
geom_histogram(binwidth = 1)
plot + theme_minimal() + geom_text(label = amount)
... but I get the error
"Error: geom_text requires the following missing aesthetics: y"
I have 2 questions as a result:
Why am I getting this error with geom_histogram? Shouldn't it assume to use count as the y value?
Will this successfully show the true values of the data points from the 'amount' column despite the plot's log scale x-axis?
Perhaps like this?
ggplot(df, aes(x = log(amount), y = ..count.., label = ..count..)) +
geom_histogram(binwidth = 1) +
stat_bin(geom = "text", binwidth = 1, vjust = -0.5) +
theme_minimal()
ggplot2 layers do not (at least in any situations I can think of) take the summary calculations of other layers, so I think the simplest thing would be to replicate the calculation using stat_bin(geom = "text"...
Or perhaps simpler, you could pre-calculate the numbers:
library(dplyr)
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 1) +
geom_text(vjust = -0.5)
EDIT -- to show buckets without the log transform we could use:
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 0.5) +
geom_text(vjust = -0.5) +
scale_x_continuous(labels = ~scales::comma(10^.),
minor_breaks = NULL)

R ggplot2 - Add a ribbon for only part of the x axis

Say I have two datasets. One that contains two months of data:
units_sold <- data.frame(date = seq(as.Date("2017-05-01"), as.Date("2017-07-01"), 1),
units = rep(20,62),
category = "units_sold")
And one that contains just a week:
forecast <- data.frame(date = seq(as.Date("2017-06-12"), as.Date("2017-06-18"), 1),
units = 5,
category = "forecast")
I can put them on the same plot. I.e.,
joined <- rbind(units_sold, forecast)
ggplot(data = joined, aes(x=date, y=units, colour = category)) + geom_line()
However, I can't seem to figure out how to put a ribbon between the two lines.
This is what I'm trying:
library(dplyr)
ribbon_dat <- left_join(forecast, units_sold, by = "date") %>%
rename(forecast = units.x) %>%
rename(units_sold = units.y) %>%
select(-c(category.x, category.y))
ggplot(data = joined, aes(x=date, y=units, colour = category)) +
geom_line() +
geom_ribbon(aes(x=ribbon_dat$date, ymin=ribbon_dat$forecast, ymax=ribbon_dat$units_sold))
I get this error: Error: Aesthetics must be either length 1 or the same as the data (69): x, ymin, ymax, y, colour
You are very close, you need to pass the second dataset to the data argument in geom_ribbon().
ggplot(data = joined, aes(x = date)) +
geom_line(aes(y = units, colour = category)) +
geom_ribbon(
data = ribbon_dat,
mapping = aes(ymin = forecast, ymax = units_sold)
)

Separate boxes for two grouping variables when color by only one variable

Here is an example from the geom_boxplot man page:
p = ggplot(mpg, aes(class, hwy))
p + geom_boxplot(aes(colour = drv))
which looks like this:
I would like to make a very similar plot, but with (yearmon formatted) dates where the class variable is in the example, and a factor variable where drv is in the example.
Here is some sample data:
df_box = data_frame(
Date = sample(
as.yearmon(seq.Date(from = as.Date("2013-01-01"), to = as.Date("2016-08-01"), by = "month")),
size = 10000,
replace = TRUE
),
Source = sample(c("Inside", "Outside"), size = 10000, replace = TRUE),
Value = rnorm(10000)
)
I have tried a bunch of different things:
Put an as.factor around the date variable, then I no longer have the nicely spaced out date scale for the x-axis:
df_box %>%
ggplot(aes(
x = as.factor(Date),
y = Value,
# group = Date,
color = Source
)) +
geom_boxplot(outlier.shape = NA) +
theme_bw() +
xlab("Month Year") +
theme(
axis.text.x = element_text(hjust = 1, angle = 50)
)
On the other hand, if I use Date as an additional group variable as suggested here, adding color no longer has any additional impact:
df_box %>%
ggplot(aes(
x = Date,
y = Value,
group = Date,
color = Source
)) +
geom_boxplot() +
theme_bw()
Any ideas as to how achieve the output of #1 while still maintaining a yearmon scale x-axis?
Since you need separate boxes for each combination of Date and Source, use interaction(Source, Date) as the group aesthetic:
ggplot(df_box, aes(x = Date, y = Value,
colour = Source,
group = interaction(Source, Date))) +
geom_boxplot()

Most succinct way to label/annotate extreme values with ggplot?

I'd like to annotate all y-values greater than a y-threshold using ggplot2.
When you plot(lm(y~x)), using the base package, the second graph that pops up automatically is Residuals vs Fitted, the third is qqplot, and the fourth is Scale-location. Each of these automatically label your extreme Y values by listing their corresponding X value as an adjacent annotation. I'm looking for something like this.
What's the best way to achieve this base-default behavior using ggplot2?
Updated scale_size_area() in place of scale_area()
You might be able to take something from this to suit your needs.
library(ggplot2)
#Some data
df <- data.frame(x = round(runif(100), 2), y = round(runif(100), 2))
m1 <- lm(y ~ x, data = df)
df.fortified = fortify(m1)
names(df.fortified) # Names for the variables containing residuals and derived qquantities
# Select extreme values
df.fortified$extreme = ifelse(abs(df.fortified$`.stdresid`) > 1.5, 1, 0)
# Based on examples on page 173 in Wickham's ggplot2 book
plot = ggplot(data = df.fortified, aes(x = x, y = .stdresid)) +
geom_point() +
geom_text(data = df.fortified[df.fortified$extreme == 1, ],
aes(label = x, x = x, y = .stdresid), size = 3, hjust = -.3)
plot
plot1 = ggplot(data = df.fortified, aes(x = .fitted, y = .resid)) +
geom_point() + geom_smooth(se = F)
plot2 = ggplot(data = df.fortified, aes(x = .fitted, y = .resid, size = .cooksd)) +
geom_point() + scale_size_area("Cook's distance") + geom_smooth(se = FALSE, show_guide = FALSE)
library(gridExtra)
grid.arrange(plot1, plot2)

Resources