I'm plotting a time series data, say 'data1', I use plot.ts(data1)
then I use abline(which.max(data1))
Now I want to add the abscissa of the maximum point, say x-abscissa=19, but sometimes it appears confounded to the number that already exist in x-axis,
My question: how can I write the abscissa of the maximum below the number that already exist on x'x.
s=c(1,1.5,2,4,1,1,5,3,5,2,3,5,2,5,2,2,4,2,7,5,2,3,5,2,3,5,2,3,5,2,3,5)
plot.ts(s)
abline(v=which.max(s), col= "red", lty=2, lwd=1)
axis(1,which.max(s))
Does this work for you Salman?
# axis(1,which.max(s))
library(glue)
label <- glue("Max is {max(s)}")
text(which.max(s), 0.2*max(s), label)
Ah, OK, like this? Not really sure what x'x means.
(I swapped to tidyverse from base R, which is much easier to use, and gives beautiful plots)
library(glue)
library(tibble)
library(ggplot2)
s <- c(1,1.5,2,4,1,1,5,3,5,2,3,5,2,5,2,2,4,2,7,5,2,3,5,2,3,5,2,3,5,2,3,5)
s <- tibble(x = 1:32, y = s)
label <- glue("Max is {max(s$y)}")
ref_line <- which.max(s$y)
ggplot(s, aes(x, y)) +
geom_line() +
labs(tag = label) +
theme(plot.tag.position = c(.65, 0.02)) +
geom_vline(xintercept = ref_line, col = "red")
Related
My data consists of three numeric variables. Something like this:
set.seed(1)
df <- data.frame(x= rnorm(10000), y= rnorm(10000))
df$col= df$x + df$y + df$x*df$y
Plotting this as a heatplot looks good:
ggplot(df, aes(x, y, col= col)) + geom_point(size= 2) + scale_color_distiller(palette = "Spectral")
But real variables can have some skewness or outliers and this totally changes the plot. After df$col[nrow(df)] <- 100 same ggplot code as above returns this plot:
Clearly, the problem is that this one point changes the scale and we get a plot with little information. My solution is to rank the data with rank() which gives a reasonable color progression for any variable I`ve tried so far. See here:
ggplot(df, aes(x, y, col= rank(col))) + geom_point(size= 2) + scale_color_distiller(palette = "Spectral")
The problem with this solution that the new scale (2,500 to 10,000) is shown as the color label. I want the original scale to be shown as color label (o to 10). Therefor, I want that the color progression corresponds to the ranked data; i.e. I need to somehow map the original values to the ranked color values. Is that possible? I tried to change limits argument to limits= c(0, 10) inside scale_color_distiller() but this does not help.
Sidenotes: I do not want to remove the outlier. Ranking works well. I wan to use scale_color_distiller(). If possible, I want not to use any additional packages than ggplot2.
rescale the rank to the range of your original df$col.
library(tidyverse)
set.seed(1)
df <- data.frame(x = rnorm(10000), y = rnorm(10000))
df %>%
mutate(
col = x + y + x * y,
scaled_rank = scales::rescale(rank(col), range(col))
) %>%
ggplot(aes(x, y, col = scaled_rank)) +
geom_point(size = 2) +
scale_color_distiller(palette = "Spectral")
Created on 2021-11-17 by the reprex package (v2.0.1)
I have two probability distribution curves, a Gamma and a standarized Normal, that I need to compare:
library(ggplot2)
pgammaX <- function(x) pgamma(x, shape = 64.57849, scale = 0.08854802)
f <- ggplot(data.frame(x=c(-4, 9)), aes(x)) + stat_function(fun=pgammaX)
f + stat_function(fun = pnorm)
The output is like this
However I need to have the two curves separated by means of the faceting mechanism provided by ggplot2, sharing the Y axis, in a way like shown below:
I know how to do the faceting if the depicted graphics come from data (i.e., from a data.frame), but I don't understand how to do it in a case like this, when the graphics are generated on line by functions. Do you have any idea on this?
you can generate the data similar to what stat_function is doing ahead of time, something like:
x <- seq(-4,9,0.1)
dat <- data.frame(p = c(pnorm(x), pgammaX(x)), g = rep(c(0,1), each = 131), x = rep(x, 2) )
ggplot(dat)+geom_line(aes(x,p, group = g)) + facet_grid(~g)
The issue with doing facet_wrap is that the same stat_function is designed to be applied to each panel of the faceted variable which you don't have.
I would instead plot them separately and use grid.arrange to combine them.
f1 <- ggplot(data.frame(x=c(-4, 9)), aes(x)) + stat_function(fun = pgammaX) + ggtitle("Gamma") + theme(plot.title = element_text(hjust = 0.5))
f2 <- ggplot(data.frame(x=c(-4, 9)), aes(x)) + stat_function(fun = pnorm) + ggtitle("Norm") + theme(plot.title = element_text(hjust = 0.5))
library(gridExtra)
grid.arrange(f1, f2, ncol=2)
Otherwise create the data frame with y values from both pgammaX and pnorm and categorize them under a faceting variable.
Finally I got the answer. First, I need to have two data sets and attach each function to each data set, as follows:
library(ggplot2)
pgammaX <- function(x) pgamma(x, shape = 64.57849, scale = 0.08854802)
a <- data.frame(x=c(3,9), category="Gamma")
b <- data.frame(x=c(-4,4), category="Normal")
f <- ggplot(a, aes(x)) + stat_function(fun=pgammaX) + stat_function(data = b, mapping = aes(x), fun = pnorm)
Then, using facet_wrap(), I separate into two graphics according to the category assigned to each data set, and establishing a free_x scale.
f + facet_wrap("category", scales = "free_x")
The result is shown below:
I am trying to plot two variables where N=700K. The problem is that there is too much overlap, so that the plot becomes mostly a solid block of black. Is there any way of having a grayscale "cloud" where the darkness of the plot is a function of the number of points in an region? In other words, instead of showing individual points, I want the plot to be a "cloud", with the more the number of points in a region, the darker that region.
One way to deal with this is with alpha blending, which makes each point slightly transparent. So regions appear darker that have more point plotted on them.
This is easy to do in ggplot2:
df <- data.frame(x = rnorm(5000),y=rnorm(5000))
ggplot(df,aes(x=x,y=y)) + geom_point(alpha = 0.3)
Another convenient way to deal with this is (and probably more appropriate for the number of points you have) is hexagonal binning:
ggplot(df,aes(x=x,y=y)) + stat_binhex()
And there is also regular old rectangular binning (image omitted), which is more like your traditional heatmap:
ggplot(df,aes(x=x,y=y)) + geom_bin2d()
An overview of several good options in ggplot2:
library(ggplot2)
x <- rnorm(n = 10000)
y <- rnorm(n = 10000, sd=2) + x
df <- data.frame(x, y)
Option A: transparent points
o1 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.05)
Option B: add density contours
o2 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.05) +
geom_density_2d()
Option C: add filled density contours
(Note that the points distort the perception of the colors underneath, may be better without points.)
o3 <- ggplot(df, aes(x, y)) +
stat_density_2d(aes(fill = stat(level)), geom = 'polygon') +
scale_fill_viridis_c(name = "density") +
geom_point(shape = '.')
Option D: density heatmap
(Same note as C.)
o4 <- ggplot(df, aes(x, y)) +
stat_density_2d(aes(fill = stat(density)), geom = 'raster', contour = FALSE) +
scale_fill_viridis_c() +
coord_cartesian(expand = FALSE) +
geom_point(shape = '.', col = 'white')
Option E: hexbins
(Same note as C.)
o5 <- ggplot(df, aes(x, y)) +
geom_hex() +
scale_fill_viridis_c() +
geom_point(shape = '.', col = 'white')
Option F: rugs
Possibly my favorite option. Not quite as flashy, but visually simple and simple to understand. Very effective in many cases.
o6 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.1) +
geom_rug(alpha = 0.01)
Combine in one figure:
cowplot::plot_grid(
o1, o2, o3, o4, o5, o6,
ncol = 2, labels = 'AUTO', align = 'v', axis = 'lr'
)
You can also have a look at the ggsubplot package. This package implements features which were presented by Hadley Wickham back in 2011 (http://blog.revolutionanalytics.com/2011/10/ggplot2-for-big-data.html).
(In the following, I include the "points"-layer for illustration purposes.)
library(ggplot2)
library(ggsubplot)
# Make up some data
set.seed(955)
dat <- data.frame(cond = rep(c("A", "B"), each=5000),
xvar = c(rep(1:20,250) + rnorm(5000,sd=5),rep(16:35,250) + rnorm(5000,sd=5)),
yvar = c(rep(1:20,250) + rnorm(5000,sd=5),rep(16:35,250) + rnorm(5000,sd=5)))
# Scatterplot with subplots (simple)
ggplot(dat, aes(x=xvar, y=yvar)) +
geom_point(shape=1) +
geom_subplot2d(aes(xvar, yvar,
subplot = geom_bar(aes(rep("dummy", length(xvar)), ..count..))), bins = c(15,15), ref = NULL, width = rel(0.8), ply.aes = FALSE)
However, this features rocks if you have a third variable to control for.
# Scatterplot with subplots (including a third variable)
ggplot(dat, aes(x=xvar, y=yvar)) +
geom_point(shape=1, aes(color = factor(cond))) +
geom_subplot2d(aes(xvar, yvar,
subplot = geom_bar(aes(cond, ..count.., fill = cond))),
bins = c(15,15), ref = NULL, width = rel(0.8), ply.aes = FALSE)
Or another approach would be to use smoothScatter():
smoothScatter(dat[2:3])
Alpha blending is easy to do with base graphics as well.
df <- data.frame(x = rnorm(5000),y=rnorm(5000))
with(df, plot(x, y, col="#00000033"))
The first six numbers after the # are the color in RGB hex and the last two are the opacity, again in hex, so 33 ~ 3/16th opaque.
You can also use density contour lines (ggplot2):
df <- data.frame(x = rnorm(15000),y=rnorm(15000))
ggplot(df,aes(x=x,y=y)) + geom_point() + geom_density2d()
Or combine density contours with alpha blending:
ggplot(df,aes(x=x,y=y)) +
geom_point(colour="blue", alpha=0.2) +
geom_density2d(colour="black")
You may find useful the hexbin package. From the help page of hexbinplot:
library(hexbin)
mixdata <- data.frame(x = c(rnorm(5000),rnorm(5000,4,1.5)),
y = c(rnorm(5000),rnorm(5000,2,3)),
a = gl(2, 5000))
hexbinplot(y ~ x | a, mixdata)
geom_pointdenisty from the ggpointdensity package (recently developed by Lukas Kremer and Simon Anders (2019)) allows you visualize density and individual data points at the same time:
library(ggplot2)
# install.packages("ggpointdensity")
library(ggpointdensity)
df <- data.frame(x = rnorm(5000), y = rnorm(5000))
ggplot(df, aes(x=x, y=y)) + geom_pointdensity() + scale_color_viridis_c()
My favorite method for plotting this type of data is the one described in this question - a scatter-density plot. The idea is to do a scatter-plot but to colour the points by their density (roughly speaking, the amount of overlap in that area).
It simultaneously:
clearly shows the location of outliers, and
reveals any structure in the dense area of the plot.
Here is the result from the top answer to the linked question:
This answer shows how you can specify where the minor breaks should go. In the documentation it says that minor_breaks can be a function. This, however, takes as input the plot limits not, as I expected, the location of the major gridlines below and above.
It doesn't seem very simple to make a script that will return me, say, 4 minors per major. This is something I would like to do since I have a script that I want to use on multiple different datasets. I don't know the limits beforehand, so I can't hard code them in. I can of course create a function that gets the values I need from the dataset before plotting, but it seems overkill.
Is there a general way to state the number of minor breaks per major break?
You can extract the majors from the plot, and from there calculate what minors you want and set it for your plot.
df <- data.frame(x = 0:10,
y = 0:10)
p <- ggplot(df, aes(x,y)) + geom_point()
majors <- ggplot_build(p)$panel$ranges[[1]]$x.major_source
multiplier <- 4
minors <- seq(from = min(majors),
to = max(majors),
length.out = ((length(majors) - 1) * multiplier) + 1)
p + scale_x_continuous(minor_breaks = minors)
I think scales::extended_breaks is the default function for a continuous scale. You can set the number of breaks in this function, and make the number of minor_breaks a integer multiple of the number of breaks.
library(ggplot2)
library(scales)
nminor <- 7
nmajor <- 5
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
geom_point() +
scale_y_continuous(breaks = extended_breaks(n = nmajor), minor_breaks = extended_breaks(n = nmajor * nminor) )
Using ggplot2 version 3, I have to modify Eric Watt's code above a bit to get it to work (I can't comment on that instead since I don't have a 50 reputation yet)
library(ggplot2)
df <- data.frame(x = 0:10,
y = 10:20)
p <- ggplot(df, aes(x,y)) + geom_point()
majors <- ggplot_build(p)$layout$panel_params[[1]]$x.major_source;majors
multiplier <- 10
minors <- seq(from = min(majors),
to = max(majors),
length.out = ((length(majors) - 1) * multiplier) + 1);minors
p + scale_x_continuous(minor_breaks = minors)
If I copy paste the same code in my editor, it doesn't create majors (NULL), and so the next line gives an error.
A ggplot2-challenged latticist needs help: What's the syntax to request variable per-facet breaks in a histogram?
library(ggplot2)
d = data.frame(x=c(rnorm(100,10,0.1),rnorm(100,20,0.1)),par=rep(letters[1:2],each=100))
# Note: breaks have different length by par
breaks = list(a=seq(9,11,by=0.1),b=seq(19,21,by=0.2))
ggplot(d, aes(x=x) ) +
geom_histogram() + ### Here the ~breaks should be added
facet_wrap(~ par, scales="free")
As pointed out by jucor, here some more solutions.
On special request, and to show why I am not a great ggplot fan, the lattice version
library(lattice)
d = data.frame(x=c(rnorm(100,10,0.1),rnorm(100,20,0.1)),par=rep(letters[1:2],each=100))
# Note: breaks have different length by par
myBreaks = list(a=seq(8,12,by=0.1),b=seq(18,22,by=0.2))
histogram(~x|par,data=d,
panel = function(x,breaks,...){
# I don't know of a generic way to get the
# grouping variable with histogram, so
# this is not very generic
par = levels(d$par)[which.packet()]
breaks = myBreaks[[par]]
panel.histogram(x,breaks=breaks,...)
},
breaks=NULL, # important to force per-panel compute
scales=list(x=list(relation="free")))
Here is one alternative:
hls <- mapply(function(x, b) geom_histogram(data = x, breaks = b),
dlply(d, .(par)), myBreaks)
ggplot(d, aes(x=x)) + hls + facet_wrap(~par, scales = "free_x")
If you need to shrink the range of x, then
hls <- mapply(function(x, b) {
rng <- range(x$x)
bb <- c(rng[1], b[rng[1] <= b & b <= rng[2]], rng[2])
geom_histogram(data = x, breaks = bb, colour = "white")
}, dlply(d, .(par)), myBreaks)
ggplot(d, aes(x=x)) + hls + facet_wrap(~par, scales = "free_x")
I don't think that it is possible to give different break points in each facet.
As workaround you can make two plots and then with grid.arrange() function from library gridExtra put them together. To set break points in geom_histogram() use binwidth= and set one value for width of bin.
p1<-ggplot(subset(d,par=="a"), aes(x=x) ) +
geom_histogram(binwidth=0.1) +
facet_wrap(~ par)
p2<-ggplot(subset(d,par=="b"), aes(x=x) ) +
geom_histogram(binwidth=0.2) +
facet_wrap(~ par)
library(gridExtra)
grid.arrange(p1,p2,ncol=2)
Following on from Didzis example:
ggplot(dat=d, aes(x=x, y=..ncount..)) +
geom_histogram(data = d[d$par == "a",], binwidth=0.1) +
geom_histogram(data = d[d$par == "b",], binwidth=0.01) +
facet_grid(.~ par, scales="free")
EDIT: This works for more levels but of course there are already better solutions
# More facets
d <- data.frame(x=c(rnorm(200,10,0.1),rnorm(200,20,0.1)),par=rep(letters[1:4],each=100))
# vector of binwidths same length as number of facets - need a nicer way to calculate these
my.width=c(0.5,0.25,0.1,0.01)
out<-lapply(1:length(my.width),function(.i) data.frame(par=levels(d$par)[.i],ggplot2:::bin(d$x[d$par==levels(d$par)[.i]],binwidth=my.width[.i])))
my.df<-do.call(rbind , out)
ggplot(my.df) + geom_histogram(aes(x, y = density, width = width), stat = "identity") + facet_wrap(~par,scales="free")
from https://groups.google.com/forum/?fromgroups=#!searchin/ggplot2/bin$20histogram$20by$20facet/ggplot2/xlqRIFPP-zE/CgfigIkgAAkJ
It is not, strictly speaking, possible to give different breaks in the different facets. But you can get the same effect by having a different layer for each facet (much as in user20650's answer), but mostly automating the multiple geom_histogram calls:
d <- data.frame(x=c(rnorm(100,10,0.1),rnorm(100,20,0.1)),
par=rep(letters[1:2],each=100))
breaks <- list(a=seq(9,11,by=0.1),b=seq(19,21,by=0.2))
ggplot(d, aes(x=x)) +
mapply(function(d, b) {geom_histogram(data=d, breaks=b)},
split(d, d$par), breaks) +
facet_wrap(~ par, scales="free_x")
The mapply call creates a list of geom_histograms which can be added to the plot. The tricky part is that you have to manually split the data (split(d, d$par)) into the data that goes into each facet.