big dotted geom_line, but dots closer together - r

I am curious if it is possible to increase the size of dots on a dotted line within geom_line but keep the dots closer together. The R code below produces a basic reproducible example of what I see, and then I include another chart showing what I would like to see.
library(dplyr)
library(ggplot2)
set.seed(5223)
myDF <- data.frame(x=rnorm(20,0,1),
y=runif(20,0,20))
myDF <- myDF %>%
mutate(From8to12 = y>=8 & y<=12)
ggplot(myDF,aes(x=x,y=y,col=From8to12)) +
geom_point() +
geom_hline(yintercept=8,lty="dotted") +
geom_hline(yintercept=12,lty="dotted",size=1.5)
Unedited Image
(Manually) Edited Image (In Paint)
I would like to make the dots bigger, but closer together. Is this possible? I've found nothing online.

It's something like lty in base R plot, and one way is to specify specifically how long should the dash and gap be:
ggplot(myDF,aes(x=x,y=y,col=From8to12)) +
geom_point() +
geom_hline(yintercept=8,lty="dotted") +
geom_hline(yintercept=12,lty="11",size=1.5)
the "11" means length of 1 for dash, length of 1 for gap, and it will replicate this. you can see more about it here

Edit
I thought it wasn't possible without some severe hacking of the underlying draw functions - #StupidWolf's answer proved me wrong. My suggestion was to draw every single point with geom_point and shape = 15 (filled square). It is then a matter of your final plot size, what parameters you chose (i.e., the distance between the 'dots' and their size)
P.S. It's impressive that you have actually managed to produce your image in paint.
library(tidyverse)
set.seed(5223)
myDF <- data.frame(x = rnorm(20, 0, 1), y = runif(20, 0, 20))
dot_dis <- 0.05
x_line <- seq(min(myDF$x), max(myDF$x), dot_dis)
y_line <- 12
ggplot() +
geom_point(aes(x, y), data = myDF) +
geom_point(aes(x_line, y_line), shape = 15, size = 1.5)
Created on 2020-02-18 by the reprex package (v0.3.0)

Related

R, ggplot, How do I keep related points together when using jitter?

One of the variables in my data frame is a factor denoting whether an amount was gained or spent. Every event has a "gain" value; there may or may not be a corresponding "spend" amount. Here is an image with the observations overplotted:
Adding some random jitter helps visually, however, the "spend" amounts are divorced from their corresponding gain events:
I'd like to see the blue circles "bullseyed" in their gain circles (where the "id" are equal), and jittered as a pair. Here are some sample data (three days) and code:
library(ggplot2)
ccode<-c(Gain="darkseagreen",Spend="darkblue")
ef<-data.frame(
date=as.Date(c("2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03")),
site=c("Castle","Temple","Temple","Temple","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Castle","Castle","Temple","Temple","Palace","Castle","Temple","Temple","Temple","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Castle","Castle","Temple","Temple","Palace"),
id=c("C123","T101","T93","T94","T95","T96","P102","P96","C126","C127","C128","T100","T98","P100","P98","C129","C130","C131","C132","C133","T104","T99","P99","C123","T101","T93","T94","T95","T96","P102","P96","C126","C127","C128","T100","T98","P100","P98","C129","C130","C131","C132","C133","T104","T99","P99"),
gainspend=c("Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend"),
amount=c(6,14,34,31,3,10,6,14,2,16,16,14,1,1,15,11,8,7,2,10,15,4,3,NA,NA,4,5,NA,NA,NA,NA,NA,NA,2,NA,1,NA,3,NA,NA,2,NA,NA,2,NA,3))
#▼ 3 day, points centered
ggplot(ef,aes(date,site)) +
geom_point(aes(size=amount,color=gainspend),alpha=0.5) +
scale_color_manual(values=ccode) +
scale_size_continuous(range=c(1,15),breaks=c(5,10,20))
#▼ 3 day, jitted
ggplot(ef,aes(date,site)) +
geom_point(aes(size=amount,color=gainspend),alpha=0.5,position=position_jitter(w=0,h=0.2)) +
scale_color_manual(values=ccode) +
scale_size_continuous(range=c(1,15),breaks=c(5,10,20))
My main idea is the old "add jitter manually" approach. I'm wondering if a nicer approach could be something like plotting little pie charts as points a la package scatterpie.
In this case you could add a random number for the amount of jitter to each ID so points within groups will be moved the same amount. This takes doing work outside of ggplot2.
First, draw the "jitter" to add for each ID. Since a categorical axis is 1 unit wide, I choose numbers between -.3 and .3. I use dplyr for this work and set the seed so you will get the same results.
library(dplyr)
set.seed(16)
ef2 = ef %>%
group_by(id) %>%
mutate(jitter = runif(1, min = -.3, max = .3)) %>%
ungroup()
Then the plot. I use a geom_blank() layer so that the categorical site axis is drawn before I add the jitter. I convert site to be numeric from a factor and add the jitter on; this only works for factors so luckily categorical axes in ggplot2 are based on factors.
Now paired ID's move together.
ggplot(ef2, aes(x = date, y = site)) +
geom_blank() +
geom_point(aes(size = amount, color = gainspend,
y = as.numeric(factor(site)) + jitter),
alpha=0.5) +
scale_color_manual(values = ccode) +
scale_size_continuous(range = c(1, 15), breaks = c(5, 10, 20))
#> Warning: Removed 15 rows containing missing values (geom_point).
Created on 2021-09-23 by the reprex package (v2.0.0)
You can add some jitter by id outside the ggplot() call.
jj <- data.frame(id = unique(ef$id), jtr = runif(nrow(ef), -0.3, 0.3))
ef <- merge(ef, jj, by = 'id')
ef$sitej <- as.numeric(factor(ef$site)) + ef$jtr
But you need to make site integer/numeric to do this. So when it comes to making the plot, you need to manually add axis labels with scale_y_continuous(). (Update: the geom_blank() trick from aosmith above is a better solution!)
ggplot(ef,aes(date,sitej)) +
geom_point(aes(size=amount,color=gainspend),alpha=0.5) +
scale_color_manual(values=ccode) +
scale_size_continuous(range=c(1,15),breaks=c(5,10,20)) +
scale_y_continuous(breaks = 1:3, labels= sort(unique(ef$site)))
This seems to work, but there are still a few gain/spend circles without a partner--perhaps there is a problem with the id variable.
Perhaps someone else has a better approach!

Can I set free scales for aesthetics other than x and y (e.g. size) when using facet_grid?

facet_grid and facet_wrap have the scales parameter, which as far as I know allows each plot to adjust the scales of the x and/or y axis to the data being plotted. Since according to the grammar of ggplot x and y are just two among many aesthetics, and there's a scale for each aesthetic, I figured it would be reasonable to have the option of letting each aesthetic be free, but so far I didn't find a way to do it.
I was trying to set it in particular for the Size, since sometimes a variables lives in a different order of magnitude depending on the group I'm using for the facet, and having the same scale for every group blocks the possibility of seeing within-group variation.
A reproducible example:
set.seed(1)
x <- runif(20,0,1)
y <- runif(20,0,1)
groups <- c(rep('small', 10), rep('big', 10))
size_small <- runif(10,0,1)
size_big <- runif(10,0,1) * 1000
df <- data.frame(x, y, groups, sizes = c(size_small, size_big))
And an auxiliary function for plotting:
basic_plot <- function(df) ggplot(df) +
geom_point(aes(x, y, size = sizes, color = groups)) +
scale_color_manual(values = c('big' = 'red', 'small' = 'blue')) +
coord_cartesian(xlim=c(0,1), ylim=c(0,1))
If I we plot the data as is, we get the following:
basic_plot(df)
Non faceted plot
The blue dots are relatively small, but there is nothing we can do.
If we add the facet:
basic_plot(df) +
facet_grid(~groups, scales = 'free')
Faceted plot
The blue dots continue being small. But I would like to take advantage of the fact that I'm dividing the data in two, and allow the size scale to adjust to the data of each plot. I would like to have something like the following:
plot_big <- basic_plot(df[df$groups == 'big',])
plot_small <- basic_plot(df[df$groups == 'small',])
grid.arrange(plot_big, plot_small, ncol = 2)
What I want
Can it be done without resorting to this kind of micromanaging, or a manual rescaling of the sizes like the following?
df %>%
group_by(groups) %>%
mutate(maximo = max(sizes),
sizes = scale(sizes, center = F)) %>%
basic_plot() +
facet_grid(~groups)
I can manage to do those things, I'm just trying to see if I'm not missing another option, or if I'm misunderstanding the grammar of graphics.
Thank you for your time!
As mentioned, original plot aesthetics are maintained when calling facet_wrap. Since you need grouped graphs, consider base::by (the subsetting data frame function) wrapped in do.call:
do.call(grid.arrange,
args=list(grobs=by(df, df$groups, basic_plot),
ncol=2,
top="Grouped Point Plots"))
Should you need to share a legend, I always use this wrapper from #Steven Lockton's answer
do.call(grid_arrange_shared_legend, by(df, df$groups, basic_plot))

R, how to add one break to the default breaks in ggplot?

Suppose I have the following issue: having a set of data, generate a chart indicating how many datapoints are below any given threshold.
This is fairly easy to achieve
n.data <- 215
set.seed(0)
dt <- rnorm(n.data) ** 2
x <- seq(0, 5, by=.2)
y <- sapply(x, function(i) length(which(dt < i)))
ggplot() +
geom_point(aes(x=x,y=y)) +
geom_hline(yintercept = n.data)
The question is, suppose I want to add a label to indicate what the total number of observation was (n.data). How do I do that, while maintaining the other breaks as default?
The outcome I'd like looks something like the image below, generated with the code
ggplot() +
geom_point(aes(x=x,y=y)) +
geom_hline(yintercept = n.data) +
scale_y_continuous(breaks = c(seq(0,200,50),n.data))
However, I'd like this to work even when I change the value of n.data, just by adding it to the default breaks.
(bonus points if you also get rid of the grid line between the last default break and the n.data one!)
Three years and some more knowledge of ggplot later, here's how I would do this today.
ggplot() +
geom_point(aes(x=x,y=y)) +
geom_hline(yintercept = n.data) +
scale_y_continuous(breaks = c(pretty(y), n.data))
Here is how you can get rid of the grid line between the last auto break and the manual one :
theme_update(panel.grid.minor=element_blank())
For the rest, I can't quite understand your question, as when you change n.data, your break is updated.

Customize linetype in ggplot2 OR add automatic arrows/symbols below a line

I would like to use customized linetypes in ggplot. If that is impossible (which I believe to be true), then I am looking for a smart hack to plot arrowlike symbols above, or below, my line.
Some background:
I want to plot some water quality data and compare it to the standard (set by the European Water Framework Directive) in a red line. Here's some reproducible data and my plot:
df <- data.frame(datum <- seq.Date(as.Date("2014-01-01"),
as.Date("2014-12-31"),by = "week"),y=rnorm(53,mean=100,sd=40))
(plot1 <-
ggplot(df, aes(x=datum,y=y)) +
geom_line() +
geom_point() +
theme_classic()+
geom_hline(aes(yintercept=70),colour="red"))
However, in this plot it is completely unclear if the Standard is a maximum value (as it would be for example Chloride) or a minimum value (as it would be for Oxygen). So I would like to make this clear by adding small pointers/arrows Up or Down. The best way would be to customize the linetype so that it consists of these arrows, but I couldn't find a way.
Q1: Is this at all possible, defining custom linetypes?
All I could think of was adding extra points below the line:
extrapoints <- data.frame(datum2 <- seq.Date(as.Date("2014-01-01"),
as.Date("2014-12-31"),by = "week"),y2=68)
plot1 + geom_point(data=extrapoints, aes(x=datum2,y=y2),
shape=">",size=5,colour="red",rotate=90)
However, I can't seem to rotate these symbols pointing downward. Furthermore, this requires calculating the right spacing of X and distance to the line (Y) every time, which is rather inconvenient.
Q2: Is there any way to achieve this, preferably as automated as possible?
I'm not sure what is requested, but it sounds as though you want arrows at point up or down based on where the y-value is greater or less than some expected value. If that's the case, then this satisfies using geom_segment:
require(grid) # as noted by ?geom_segment
(plot1 <-
ggplot(df, aes(x=datum,y=y)) + geom_line()+
geom_segment(data = data.frame( df$datum, y= 70, up=df$y >70),
aes(xend = datum , yend =70 + c(-1,1)[1+up]*5), #select up/down based on 'up'
arrow = arrow(length = unit(0.1,"cm"))
) + # adjust units to modify size or arrow-heads
geom_point() +
theme_classic()+
geom_hline(aes(yintercept=70),colour="red"))
If I'm wrong about what was desired and you only wanted a bunch of down arrows, then just take out the stuff about creating and using "up" and use a minus-sign.

How to plot stacked point histograms?

What's the ggplot2 equivalent of "dotplot" histograms? With stacked points instead of bars? Similar to this solution in R:
Plot Histogram with Points Instead of Bars
Is it possible to do this in ggplot2? Ideally with the points shown as stacks and a faint line showing the smoothed line "fit" to these points (which would make a histogram shape.)
ggplot2 does dotplots Link to the manual.
Here is an example:
library(ggplot2)
set.seed(789); x <- data.frame(y = sample(1:20, 100, replace = TRUE))
ggplot(x, aes(y)) + geom_dotplot()
In order to make it behave like a simple dotplot, we should do this:
ggplot(x, aes(y)) + geom_dotplot(binwidth=1, method='histodot')
You should get this:
To address the density issue, you'll have to add another term, ylim(), so that your plot call will have the form ggplot() + geom_dotplot() + ylim()
More specifically, you'll write ylim(0, A), where A will be the number of stacked dots necessary to count 1.00 density. In the example above, the best you can do is see that 7.5 dots reach the 0.50 density mark. From there, you can infer that 15 dots will reach 1.00.
So your new call looks like this:
ggplot(x, aes(y)) + geom_dotplot(binwidth=1, method='histodot') + ylim(0, 15)
Which will give you this:
Usually, this kind of eyeball estimate will work for dotplots, but of course you can try other values to fine-tune your scale.
Notice how changing the ylim values doesn't affect how the data is displayed, it just changes the labels in the y-axis.
As #joran pointed out, we can use geom_dotplot
require(ggplot2)
ggplot(mtcars, aes(x = mpg)) + geom_dotplot()
Edit: (moved useful comments into the post):
The label "count" it's misleading because this is actually a density estimate may be you could suggest we changed this label to "density" by default. The ggplot implementation of dotplot follow the original one of Leland Wilkinson, so if you want to understand clearly how it works take a look at this paper.
An easy transformation to make the y axis actually be counts, i.e. "number of observations". From the help page it is written that:
When binning along the x axis and stacking along the y axis, the numbers on y axis are not meaningful, due to technical limitations of ggplot2. You can hide the y axis, as in one of the examples, or manually scale it to match the number of dots.
So you can use this code to hide y axis:
ggplot(mtcars, aes(x = mpg)) +
geom_dotplot(binwidth = 1.5) +
scale_y_continuous(name = "", breaks = NULL)
I introduce an exact approach using #Waldir Leoncio's latter method.
library(ggplot2); library(grid)
set.seed(789)
x <- data.frame(y = sample(1:20, 100, replace = TRUE))
g <- ggplot(x, aes(y)) + geom_dotplot(binwidth=0.8)
g # output to read parameter
### calculation of width and height of panel
grid.ls(view=TRUE, grob=FALSE)
real_width <- convertWidth(unit(1,'npc'), 'inch', TRUE)
real_height <- convertHeight(unit(1,'npc'), 'inch', TRUE)
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$panel$ranges[[1]]$x.range)
real_binwidth <- real_width / width_coordinate_range * 0.8 # 0.8 is the argument binwidth
num_balls <- real_height / 1.1 / real_binwidth # the number of stacked balls. 1.1 is expanding value.
# num_balls is the value of A
g + ylim(0, num_balls)
Apologies : I don't have enough reputation to 'comment'.
I like cuttlefish44's "exact approach", but to make it work (with ggplot2 [2.2.1]) I had to change the following line from :
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$panel$ranges[[1]]$x.range)
to
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$layout$panel_ranges[[1]]$x.range)

Resources