Create a plot, flipping axes while keeping the same scale - r

I'm plotting some graphs to introduce the concept of mathematical function to highschool students. Right now, I'd like to give them an example of what is NOT a function, by plotting an horizontal parabola:
x <- seq(from = -3, to = 3, by = 0.001)
y <- -x^2 + 5
grafico <- ggplot()+
geom_hline(yintercept = 0)+
geom_vline(xintercept = 0)+
geom_line(mapping = aes(x = x, y = y),color="darkred",size=1)+
theme_light()+
xlab("")+
ylab("")+
scale_x_continuous(breaks = seq(from = -100, to = 100, by = 1))+
scale_y_continuous(breaks = seq(from = -100, to = 100, by = 1))+
coord_flip(ylim = c(-1.5,5.5), xlim = c(-3,3),expand = FALSE)
print(grafico)
Which outputs the following image:
This is quite close to what I want, but I would like both axes' scales to match, to keep things simple for the students. For this, I'd tried using coord_equal, but unluckily, it seems to cancel coord_flip's effects:
x <- seq(from = -3, to = 3, by = 0.001)
y <- -x^2 + 5
grafico <- ggplot()+
geom_hline(yintercept = 0)+
geom_vline(xintercept = 0)+
geom_line(mapping = aes(x = x, y = y),color="darkred",size=1)+
theme_light()+
xlab("")+
ylab("")+
scale_x_continuous(breaks = seq(from = -100, to = 100, by = 1))+
scale_y_continuous(breaks = seq(from = -100, to = 100, by = 1))+
coord_flip(ylim = c(-1.5,5.5), xlim = c(-3,3),expand = FALSE)+
coord_equal()
print(grafico)
My question is: Is there a simple way to include coord_flip functionality into coord_equal?
For example, I know I can get coord_cartesian functionality by using the parameters ylim and xlim.

Based on your use case, it doesn't look like you really need to flip the coordinates: you can just reverse the order of inputs for x & y, and use geom_path() instead of geom_line() to force the plot to follow the order in your inputs.
The ggplot help file states:
geom_path() connects the observations in the order in which they
appear in the data. geom_line() connects them in order of the
variable on the x axis.
ggplot() +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
geom_path(mapping = aes(x = y, y = x), color="darkred", size = 1) + # switch x & y here
theme_light() +
xlab("") +
ylab("") +
scale_x_continuous(breaks = seq(from = -100, to = 100, by = 1)) +
scale_y_continuous(breaks = seq(from = -100, to = 100, by = 1)) +
coord_equal(xlim = c(-1.5, 5.5), ylim = c(-3, 3), expand = FALSE) # switch x & y here

Related

Increase the margin of every second x-axis tick ggplot2

I'm looking for a way to move every second x-axis tick downwards and have the tick line go down with it.
I can change the general margin and tick length for all ticks with:
#MWE
library(ggplot2)
ggplot(cars, aes(dist, speed))+
geom_point()+
theme(
axis.ticks.length.x = unit(15, "pt")
)
But, I would like the x-axis ticks 0, 50, and 100 (i.e., every second tick) to be without the added top margin.
A generalized answer is preferred as my x-axis is categorical and not numerical (and contains 430 ticks, so nothing I can set by hand).
Any ideas?
Edit:
Output should be:
Edit2:
A more intricate example would be:
#MWE
ggplot(diamonds, aes(cut, price, fill = clarity, group = clarity))+
geom_col(position = 'dodge')+
theme(
axis.ticks.length.x = unit(15, "pt")
)
Edit -- added categorical approach at bottom.
Here's a hack. Hope there's a better way!
ticks <- data.frame(
x = 25*0:5,
y = rep(c(-0.2, -2), 3)
)
ggplot(cars, aes(dist, speed))+
geom_point()+
geom_rect(fill = "white", xmin = -Inf, xmax = Inf,
ymin = 0, ymax = -5) +
geom_segment(data = ticks,
aes(x = x, xend = x,
y = 0, yend = y)) +
geom_text(data = ticks,
aes(x = x, y = y, label = x), vjust = 1.5) +
theme(axis.ticks.x = element_blank()) +
scale_x_continuous(breaks = 25*0:5, labels = NULL, name = "") +
coord_cartesian(clip = "off")
Here's a similar approach used with a categorical x.
cats <- sort(as.character(unique(diamonds$cut)))
ticks <- data.frame(x = cats)
ticks$y = ifelse(seq_along(cats) %% 2, -500, -2000)
ggplot(diamonds, aes(cut, price, fill = clarity, group = clarity))+
geom_col(position = 'dodge') +
annotate("rect", fill = "white",
xmin = 0.4, xmax = length(cats) + 0.6,
ymin = 0, ymax = -3000) +
geom_segment(data = ticks, inherit.aes = F,
aes(x = x, xend = x,
y = 0, yend = y)) +
geom_text(data = ticks, inherit.aes = F,
aes(x = x, y = y, label = x), vjust = 1.5) +
scale_x_discrete(labels = NULL, name = "cut") +
scale_y_continuous(expand = expand_scale(mult = c(0, 0.05))) +
theme(axis.ticks.x = element_blank()) +
coord_cartesian(clip = "off")

How to include "think-cell"-like percentage changes in a waterfall-chart generated in ggplot2

I try to establish R as data visualisation tool in my company. A typical graph type used in my department are waterfall charts (https://en.wikipedia.org/wiki/Waterfall_chart).
In R, there are some packages and hints for ggplot to generate a waterfall chart (https://learnr.wordpress.com/2010/05/10/ggplot2-waterfall-charts/), which I used already.
Unfortunately, a common feature for the used waterfall charts are annotations with arrows to indicate the percentage changes within the steps.
See an example below:
Or here in this video (https://www.youtube.com/watch?v=WMHf7uFR6Rk)
The software used to produce such kind of plots is think cell (https://www.think-cell.com/), which is an add-on to Excel and Powerpoint.
The problem I have is that I don't know how to start to tackle the topic. My first thoughts are going in this direction:
Use geom_segment for generating the arrows and boxes
Use ggplot's annotate funktion to place the text at the arrows or in the boxes
Calculate the positions automatically based on the data provided to the waterfall chart.
May I ask you, if you have additional thoughts/ideas to implement such graphs in ggplot?
Best Regards Markus
Here's an example of the approach I would take.
Step 1. Pick which elements should be added, and add them one at a time.
Let's say we're starting with this simple chart:
df <- data.frame(x = c(2007, 2008, 2009),
y = c(100, 120, 140))
ggplot(df, aes(x, y, label = y)) +
geom_col() +
geom_text(vjust = -0.5)
First of all, we need some extra vertical space:
ggplot(df, aes(x, y, label = y)) +
geom_col() +
geom_text(vjust = -0.5) +
scale_y_continuous(expand = expand_scale(add = c(10, 50))) # Add 50 y padding
Now, I incrementally add layers until it looks like I want:
# Semi-manual proof of concept
ggplot(df, aes(x, y, label = y)) +
geom_col() +
geom_text(vjust = -0.5) +
scale_y_continuous(expand = expand_scale(add = c(10, 50))) + # Add 50 y padding
# Line with arrow
geom_segment(aes(x = df$x[3], y = df$y[3] + 50,
xend = df$x[3], yend = df$y[3] + 50),
arrow = arrow(length = unit(0.02, "npc"), type = "closed")) +
# Background box
geom_tile(aes(x = mean(c(df$x[3], df$x[3])),
y = mean(c(df$y[3], df$y[3])) + 50, width = 1, height = 40),
fill = "white", color = "black", size = 0.5) +
# Text
geom_text(aes(x = mean(c(df$x[3], df$x[3])),
y = mean(c(df$y[3], df$y[3])) + 50,
label = paste0("CAGR\n",
df$x[3], "-", df$x[3], "\n",
scales::percent((df$y[3] / df$y[3]) ^ (1/(df$x[3]-df$x[3])) - 1))))
Step 2. Make it into a function
Now I move the CAGR-related layers into a function, replacing most of the constants with function parameters.
add_CAGR <- function(df, first_val_pos, second_val_pos,
y_offset, box_width = 1, box_height) {
list(
# Line with arrow
geom_segment(aes(x = df$x[first_val_pos],
xend = df$x[second_val_pos],
y = df$y[first_val_pos] + y_offset,
yend = df$y[second_val_pos] + y_offset),
arrow = arrow(length = unit(0.02, "npc"), type = "closed")),
# Background box
geom_tile(aes(x = mean(c(df$x[first_val_pos], df$x[second_val_pos])),
y = mean(c(df$y[first_val_pos], df$y[second_val_pos])) + y_offset,
width = box_width, height = box_height),
fill = "white", color = "black", size = 0.5),
# Text
geom_text(aes(x = mean(c(df$x[first_val_pos], df$x[second_val_pos])),
y = mean(c(df$y[first_val_pos], df$y[second_val_pos])) + y_offset,
label = paste0("CAGR\n",
df$x[first_val_pos], "-", df$x[second_val_pos], "\n",
scales::percent((df$y[second_val_pos] / df$y[1]) ^
(1/(df$x[second_val_pos]-df$x[first_val_pos])) - 1))),
lineheight = 0.8)
)
}
Step 3: Use in plot
ggplot(df, aes(x, y, label = y)) +
geom_col() +
geom_text(vjust = -0.5) +
scale_y_continuous(expand = expand_scale(add = c(0, 50))) + # Add 50 y padding
add_CAGR(df, first_val_pos = 1, second_val_pos = 3,
y_offset = 50,
box_width = 0.7, box_height = 40)
Or the same thing just between the first two bars:
ggplot(df, aes(x, y, label = y)) +
geom_col() +
geom_text(vjust = -0.5) +
scale_y_continuous(expand = expand_scale(add = c(0, 50))) + # Add 50 y padding
add_CAGR(df, first_val_pos = 1, second_val_pos = 2,
y_offset = 50,
box_width = 0.7, box_height = 40)

Using ggrepel with single plot point/adding line between label and point

Ok so I have a data set with 2 variables X and Y, and an ID variable. I've created a full plot using this code:
ggplot(data = X_Y) +
geom_point(mapping = aes(x = X, y = Y))+
geom_text_repel(mapping = aes(x = X, y = Y, label = ID))+
xlim(0,100)+
ylim(0,100)
This produces a plot like this:
I now wish to create a number of separate plots only showing a single data point at a time with their label.
Now I can use just geom_label without repel and nudge the label to get this:
While this plot is ok, I was wondering if there was any way to keep the lines connecting labels to points like how ggrepel does...
EDIT
From the first suggestion, when I try use repel with only one case selected I get the following plot:
ggplot(data = X_Y) +
geom_point(aes(x = X[4], y = Y[4]))+
geom_label_repel(aes(x = X[4], y = Y[4]),
label = "You are here",
min.segment.length = unit(0, 'lines'),
nudge_y = 6)+
labs(x = "X",y = "Y",title = "mytitle")+
scale_x_continuous(limits = c(0, 100)) +
scale_y_continuous(limits = c(0, 100))
RESOLVED
Figured it out! I need to specify my data in ggplot() to only be the X and Y variables and limit to the row of interest.
Like this:
ggplot(data = X_Y[4,c(3,4)) +
geom_point(aes(x = X, y = Y))+
geom_label_repel(aes(x = X, y = Y),
label = "You are here",
min.segment.length = unit(0, 'lines'),
nudge_y = 6)+
labs(x = "X",y = "Y",title = "mytitle")+
scale_x_continuous(limits = c(0, 100)) +
scale_y_continuous(limits = c(0, 100))
You can of course still use geom_label_repel, even with a single point. To be sure a segment is drawn adjust the min.segment.length arg. This arg sets the minimum distance from the point to the label to draw a segment, setting it to unit(0, 'lines') ensures every segment is drawn:
library(ggplot2)
library(ggrepel)
ggplot(data.frame(x = 2, y = 3)) +
geom_point(aes(x, y)) +
geom_label_repel(aes(x, y),
label = 'You are here',
min.segment.length = unit(0, 'lines'),
nudge_y = .2) +
scale_x_continuous(limits = c(0, 3)) +
scale_y_continuous(limits = c(0, 4))

Showing median value in grouped boxplot in R

I have created boxplots using ggplot2 with this code.
plotgraph <- function(x, y, colour, min, max)
{
plot1 <- ggplot(dims, aes(x = x, y = y, fill = Region)) +
geom_boxplot()
#plot1 <- plot1 + scale_x_discrete(name = "Blog Type")
plot1 <- plot1 + labs(color='Region') + geom_hline(yintercept = 0, alpha = 0.4)
plot1 <- plot1 + scale_y_continuous(breaks=c(seq(min,max,5)), limits = c(min, max))
plot1 <- plot1 + labs(x="Blog Type", y="Dimension Score") + scale_fill_grey(start = 0.3, end = 0.7) + theme_grey()
plot1 <- plot1 + theme(legend.justification = c(1, 1), legend.position = c(1, 1))
return(plot1)
}
plot1 <- plotgraph (Blog, Dim1, Region, -30, 25)
A part of data I use is reproduced here.
Blog,Region,Dim1,Dim2,Dim3,Dim4
BlogsInd.,PK,-4.75,13.47,8.47,-1.29
BlogsInd.,PK,-5.69,6.08,1.51,-1.65
BlogsInd.,PK,-0.27,6.09,0.03,1.65
BlogsInd.,PK,-2.76,7.35,5.62,3.13
BlogsInd.,PK,-8.24,12.75,3.71,3.78
BlogsInd.,PK,-12.51,9.95,2.01,0.21
BlogsInd.,PK,-1.28,7.46,7.56,2.16
BlogsInd.,PK,0.95,13.63,3.01,3.35
BlogsNews,PK,-5.96,12.3,6.5,1.49
BlogsNews,PK,-8.81,7.47,4.76,1.98
BlogsNews,PK,-8.46,8.24,-1.07,5.09
BlogsNews,PK,-6.15,0.9,-3.09,4.94
BlogsNews,PK,-13.98,10.6,4.75,1.26
BlogsNews,PK,-16.43,14.49,4.08,9.91
BlogsNews,PK,-4.09,9.88,-2.79,5.58
BlogsNews,PK,-11.06,16.21,4.27,8.66
BlogsNews,PK,-9.04,6.63,-0.18,5.95
BlogsNews,PK,-8.56,7.7,0.71,4.69
BlogsNews,PK,-8.13,7.26,-1.13,0.26
BlogsNews,PK,-14.46,-1.34,-1.17,14.57
BlogsNews,PK,-4.21,2.18,3.79,1.26
BlogsNews,PK,-4.96,-2.99,3.39,2.47
BlogsNews,PK,-5.48,0.65,5.31,6.08
BlogsNews,PK,-4.53,-2.95,-7.79,-0.81
BlogsNews,PK,6.31,-9.89,-5.78,-5.13
BlogsTech,PK,-11.16,8.72,-5.53,8.86
BlogsTech,PK,-1.27,5.56,-3.92,-2.72
BlogsTech,PK,-11.49,0.26,-1.48,7.09
BlogsTech,PK,-0.9,-1.2,-2.03,-7.02
BlogsTech,PK,-12.27,-0.07,5.04,8.8
BlogsTech,PK,6.85,1.27,-11.95,-10.79
BlogsTech,PK,-5.21,-0.89,-6,-2.4
BlogsTech,PK,-1.06,-4.8,-8.62,-2.42
BlogsTech,PK,-2.6,-4.58,-2.07,-3.25
BlogsTech,PK,-0.95,2,-2.2,-3.46
BlogsTech,PK,-0.82,7.94,-4.95,-5.63
BlogsTech,PK,-7.65,-5.59,-3.28,-0.54
BlogsTech,PK,0.64,-1.65,-2.36,-2.68
BlogsTech,PK,-2.25,-3,-3.92,-4.87
BlogsTech,PK,-1.58,-1.42,-0.38,-5.15
Columns,PK,-5.73,3.26,0.81,-0.55
Columns,PK,0.37,-0.37,-0.28,-1.56
Columns,PK,-5.46,-4.28,2.61,1.29
Columns,PK,-3.48,2.38,12.87,3.73
Columns,PK,0.88,-2.24,-1.74,3.65
Columns,PK,-2.11,4.51,8.95,2.47
Columns,PK,-10.13,10.73,9.47,-0.47
Columns,PK,-2.08,1.04,0.11,0.6
Columns,PK,-4.33,5.65,2,-0.77
Columns,PK,1.09,-0.24,-0.92,-0.17
Columns,PK,-4.23,-4.01,-2.32,6.26
Columns,PK,-1.46,-1.53,9.83,5.73
Columns,PK,9.37,-1.32,1.27,-4.12
Columns,PK,5.84,-2.42,-5.21,1.07
Columns,PK,8.21,-9.36,-5.87,-3.21
Columns,PK,7.34,-7.3,-2.94,-5.86
Columns,PK,1.83,-2.77,1.47,-4.02
BlogsInd.,PK,14.39,-0.55,-5.42,-4.7
BlogsInd.,US,22.02,-1.39,2.5,-3.12
BlogsInd.,US,4.83,-3.58,5.34,9.22
BlogsInd.,US,-3.24,2.83,-5.3,-2.07
BlogsInd.,US,-5.69,15.17,-14.27,-1.62
BlogsInd.,US,-22.92,4.1,5.79,-3.88
BlogsNews,US,0.41,-2.03,-6.5,2.81
BlogsNews,US,-4.42,8.49,-8.04,2.04
BlogsNews,US,-10.72,-4.3,3.75,11.74
BlogsNews,US,-11.29,2.01,0.67,8.9
BlogsNews,US,-2.89,0.08,-1.59,7.06
BlogsNews,US,-7.59,8.51,3.02,12.33
BlogsNews,US,-7.45,23.51,2.79,0.48
BlogsNews,US,-12.49,15.79,-9.86,18.29
BlogsTech,US,-11.59,6.38,11.79,-7.28
BlogsTech,US,-4.6,4.12,7.46,3.36
BlogsTech,US,-22.83,2.54,10.7,5.09
BlogsTech,US,-4.83,3.37,-8.12,-0.9
BlogsTech,US,-14.76,29.21,6.23,9.33
Columns,US,-15.93,12.85,19.47,-0.88
Columns,US,-2.78,-1.52,8.16,0.24
Columns,US,-16.39,13.08,11.07,7.56
Even though I have tried to add detailed scale on y-axis, it is hard for me to pinpoint exact median score for each boxplot. So I need to print median value within each boxplot. There was another answer available (for faceted boxplot) which does not work for me as the printed values are not within the boxes but jammed together in the middle. It will be great to be able to print them within (middle and above the median line of) boxplots.
Thanks for your help.
Edit: I make a grouped graph as below.
Add
library(dplyr)
dims=dims%>%
group_by(Blog,Region)%>%
mutate(med=median(Dim1))
plotgraph <- function(x, y, colour, min, max)
{
plot1 <- ggplot(dims, aes(x = x, y = y, fill = Region)) +
geom_boxplot()+
labs(color='Region') +
geom_hline(yintercept = 0, alpha = 0.4)+
scale_y_continuous(breaks=c(seq(min,max,5)), limits = c(min, max))+
labs(x="Blog Type", y="Dimension Score") + scale_fill_grey(start = 0.3, end = 0.7) +
theme_grey()+
theme(legend.justification = c(1, 1), legend.position = c(1, 1))+
geom_text(aes(y = med,x=x, label = round(med,2)),position=position_dodge(width = 0.8),size = 3, vjust = -0.5,colour="blue")
return(plot1)
}
plot1 <- plotgraph (Blog, Dim1, Region, -30, 25)
Which gives (the text colour can be tweaked to something less tacky):
Note: You should consider using non-standard evaluation in your function rather than having it require the use of attach()
Edit:
One liner, not as clean I wanted it to be since I ran into problems with dplyr not properly aggregating the data even though it says the grouping was performed.
This function assume the dataframe is always called dims
library(ggplot2)
library(reshape2)
plotgraph <- function(x, y, colour, min, max)
{
plot1 <- ggplot(dims, aes_string(x = x, y = y, fill = colour)) +
geom_boxplot()+
labs(color=colour) +
geom_hline(yintercept = 0, alpha = 0.4)+
scale_y_continuous(breaks=c(seq(min,max,5)), limits = c(min, max))+
labs(x="Blog Type", y="Dimension Score") +
scale_fill_grey(start = 0.3, end = 0.7) +
theme_grey()+
theme(legend.justification = c(1, 1), legend.position = c(1, 1))+
geom_text(data= melt(with(dims, tapply(eval(parse(text=y)),list(eval(parse(text=x)),eval(parse(text=colour))), median)),varnames=c("Blog","Region"),value.name="med"),
aes_string(y = "med",x=x, label = "med"),position=position_dodge(width = 0.8),size = 3, vjust = -0.5,colour="blue")
return(plot1)
}
plot1 <- plotgraph ("Blog", "Dim1", "Region", -30, 25)
Assuming that Blog is your dataframe, the following should work:
min <- -30
max <- 25
meds <- aggregate(Dim1~Region, Blog, median)
plot1 <- ggplot(Blog, aes(x = Region, y = Dim1, fill = Region)) +
geom_boxplot()
plot1 <- plot1 + labs(color='Region') + geom_hline(yintercept = 0, alpha = 0.4)
plot1 <- plot1 + scale_y_continuous(breaks=c(seq(min,max,5)), limits = c(min, max))
plot1 <- plot1 + labs(x="Blog Type", y="Dimension Score") + scale_fill_grey(start = 0.3, end = 0.7) + theme_grey()
plot1 + theme(legend.justification = c(1, 1), legend.position = c(1, 1)) +
geom_text(data = meds, aes(y = Dim1, label = round(Dim1,2)),size = 5, vjust = -0.5, color='white')

ggplot2: normal distribution curve at bottom of graph - geom_text() + geom_historigram()

I'm trying to add a normal distribution line to my chart. But it simply becomes flat at the bottom for some reason.
My code
MyChart <- function(x) {
ggplot(x, aes( x = max.DrawD, y = cum.Return, label = Symbol)) +
scale_y_continuous(breaks = c(seq(0, 10, 1)), limits = c(0,10)) + # outliers excluded
scale_x_continuous(limit =c(0, 0.5)) +
geom_histogram(aes(y = ..density..), binwidth = 0.02) +
geom_text(size = 3) +
stat_function(fun = dnorm, colour = 'firebrick') +
theme_classic()
}
As you can see, the red line (my stat_function() code) is right at the bottom of the graph. How can I resolve this?
UPDATE: So I solved it. But I don't know why it now works. Just added a manual spec on the mean and standard deviation.
Updated code
MyChart <- function(x) {
ggplot(x, aes( x = max.DrawD, y = cum.Return, label = Symbol)) +
scale_y_continuous(breaks = c(seq(0, 10, 1)), limits = c(0,10)) + # outliers excluded
scale_x_continuous(limit =c(0, 0.5)) +
geom_histogram(aes(y = ..density..), binwidth = 0.02) +
geom_text(size = 3) +
stat_function(fun = dnorm, args = list(mean = mean(x$max.DrawD), sd = sd(x$max.DrawD)), colour = 'firebrick') +
theme_classic()
}
From #user20650: it works as the function dnorm needs parameters mean and sd. If you dont specify them they are assumed to be zero and one

Resources