Add y-axis-values to ggplot? - r

Say I have a ggplot, with a continuous variable allocated to the y-axis and a geom_point layer. How would I add the y-axis-value corresponding to each point to the ggplot as an additional layer?
EDIT:
For clarification: Next to each point, I'd like to see a number. That number should be the value of the y-variable that corresponds to the respective point.

This looks weird but seems to be in line with your question
### Library
library(ggplot2)
### Initiating data
set.seed(2)
df <- data.frame(y=rnorm(10),
x=rnorm(10))
### Display plot
ggplot(df, aes(x, y)) +
geom_point() +
scale_y_continuous(sec.axis=sec_axis(~.,
breaks=round(df$y, 2),
labels=round(df$y, 2), name="Additional y axis"))
EDIT
Based on your edit, please find the code to put y values next to each dot.
### Library
library(ggplot2)
### Initiating data
set.seed(2)
df <- data.frame(y=rnorm(10),
x=rnorm(10))
### Display plot
ggplot(df, aes(x, y)) +
geom_point() +
geom_text(x=df$x+0.1, y=df$y, label=round(df$y, 2))

Related

modifying ggplot objects after creation

Is there a preferred way to modify ggplot objects after creation?
For example I recommend my students to save the r object together with the pdf file for later changes...
library(ggplot2)
graph <-
ggplot(mtcars, aes(x=mpg, y=qsec, fill=cyl)) +
geom_point() +
geom_text(aes(label=rownames(mtcars))) +
xlab('miles per galon') +
ggtitle('my title')
ggsave('test.pdf', graph)
save(graph, file='graph.RData')
So new, in case they have to change title or labels or sometimes other things, they can easily load the object and change simple things.
load('graph.RData')
print(graph)
graph +
ggtitle('better title') +
ylab('seconds per quarter mile')
What do I have to do for example to change the colour to discrete scale? In the original plot I would wrap the y in as.factor. But is there a way to do it afterwards?
Or is there a better way on modifying the objects, when the data is gone. Would love to get some advice.
You could use ggplot_build() to alter the plot without the code or data:
Example plot:
data("iris")
p <- ggplot(iris) +
aes(x = Sepal.Length, y = Sepal.Width, colour = Species) +
geom_point()
Colours are respective to Species.
Disassemble the plot using ggplot_build():
q <- ggplot_build(p)
Take a look at the object q to see what is happening here.
To change the colour of the point, you can alter the respective table in q:
q$data[[1]]$colour <- "black"
Reassemble the plot using ggplot_gtable():
q <- ggplot_gtable(q)
And plot it:
plot(q)
Now, the points are black.

Plot point on ggplot2 smoothing regression on vline intersection

I want to create a (time-series) plot out of 40 million data points in order to show two regression lines with two specific events on each of it (first occurrence of an optimum in time-series).
Currently, I draw the regression lines and add a geom_vline to it to indicate the event.
As I want to be independent from colours in the plot, it would be beneficial if I could just plot the marker geom_vline as a point on the regression line.
Do you have any idea how to solve this using ggplot2?
My current approach is this here (replaced data points with test data):
library(ggplot2)
# Generate data
m1 <- "method 1"
m2 <- "method 2"
data1 <- data.frame(Time=seq(100), Value=sample(1000, size=100), Type=rep(as.factor(m1), 100))
data2 <- data.frame(Time=seq(100), Value=sample(1000, size=100), Type=rep(as.factor(m2), 100))
df <- rbind(data1, data2)
rm(data1, data2)
# Calculate first minima for each Type
m1_intercept <- df[which(df$Type == m1), ][which.min(df[which(df$Type == m1), ]$Value),]
m2_intercept <- df[which(df$Type == m2), ][which.min(df[which(df$Type == m2), ]$Value),]
# Plot regression and vertical lines
p1 <- ggplot(df, aes(x=Time, y=Value, group=Type, colour=Type), linetype=Type) +
geom_smooth(se=F) +
geom_vline(aes(xintercept=m1_intercept$Time, linetype=m1_intercept$Type)) +
geom_vline(aes(xintercept=m2_intercept$Time, linetype=m2_intercept$Type)) +
scale_linetype_manual(name="", values=c("dotted", "dashed")) +
guides(colour=guide_legend(title="Regression"), linetype=guide_legend(title="First occurrence of optimum")) +
theme(legend.position="bottom")
ggsave("regression.png", plot=p1, height=5, width=7)
which generates this plot:
My desired plot would be something like this:
So my questions are
Does it make sense to indicate a minimum value on a regression line? The values y-axis position would be in fact wrong but just to indicate the timepoint?
If yes, how can I achieve such a behaviour?
If no, what would you think could be better?
Thank you very much in advance!
Robin
If you first run your ggplot() call with only geom_smooth(), you can access plotted values through ggplot_build(), which we then can use to plot points on the two fitted lines. Example:
# Create initial plot
p1<-ggplot(df, aes(x=Time, y=Value, colour=Type)) +
geom_smooth(se=F)
# Now we can access the fitted values
smooths <- ggplot_build(p1)$data[[1]]
smooths_1 <- smooths[smooths$group==1,] # First group (method 1)
smooths_2 <- smooths[smooths$group==2,] # Second group (method 2)
# Then we find the closest plotted values to the minima
smooth_1_x <- smooths_1$x[which.min(abs(smooths_1$x - m1_intercept$Time))]
smooth_2_x <- smooths_2$x[which.min(abs(smooths_2$x - m2_intercept$Time))]
# Subset the previously defined datasets for respective closest values
point_data1 <- smooths_1[smooths_1$x==smooth_1_x,]
point_data2 <- smooths_1[smooths_2$x==smooth_2_x,]
Now we use point_data1 and point_data2 to place the points on your plot:
ggplot(df, aes(x=Time, y=Value, colour=Type)) +
geom_smooth(se=F) +
geom_point(data=point_data1, aes(x=x, y=y), colour = "red",size = 5) +
geom_point(data=point_data2, aes(x=x, y=y), colour = "red", size = 5)
To reproduce this plot, you can use set.seed(42) for your data generation step.

How to get the points inside of the ellipse in ggplot2?

I'm trying to identify the densest region in the plot. And I do this using stat_ellipse() in ggplot2. But I can not get the information (sum total, order number of each point and so on) of the points inside of the ellipse.
Seldom see the discussion about this problem. Is this possible?
For example:
ggplot(faithful, aes(waiting, eruptions))+
geom_point()+
stat_ellipse()
Here is Roman's suggestion implemented. The help for stat_ellipse says it uses a modified version of car::ellipse, so therefore I chose to extract the ellipse points from the ggplot object. That way it should always be correct (also if you change options in stat_ellipse).
# Load packages
library(ggplot2)
library(sp)
# Build the plot first
p <- ggplot(faithful, aes(waiting, eruptions)) +
geom_point() +
stat_ellipse()
# Extract components
build <- ggplot_build(p)$data
points <- build[[1]]
ell <- build[[2]]
# Find which points are inside the ellipse, and add this to the data
dat <- data.frame(
points[1:2],
in.ell = as.logical(point.in.polygon(points$x, points$y, ell$x, ell$y))
)
# Plot the result
ggplot(dat, aes(x, y)) +
geom_point(aes(col = in.ell)) +
stat_ellipse()

R ggplot2: Labeling a horizontal line without associating the label with a series

I'd like to label a horizontal line on a ggplot with multiple series, without associating the line with a series. R ggplot2: Labelling a horizontal line on the y axis with a numeric value asks about the single-series case, for which geom_text solves. However, geom_text associates the label with one of the series via color and legend.
Consider the same example from that question, with another color column:
library(ggplot2)
df <- data.frame(y=1:10, x=1:10, col=c("a", "b")) # Added col
h <- 7.1
plot1 <- ggplot(df, aes(x=x, y=y, color=col)) + geom_point()
plot2 <- plot1 + geom_hline(aes(yintercept=h))
# Applying top answer https://stackoverflow.com/a/12876602/1840471
plot2 + geom_text(aes(0, h, label=h, vjust=-1))
How can I label the line without associating the label to one of the series?
Is this what you had in mind?
library(ggplot2)
df <- data.frame(y=1:10, x=1:10, col=c("a", "b")) # Added col
h <- 7.1
ggplot(df, aes(x=x,y=y)) +
geom_point(aes(color=col)) +
geom_hline(yintercept=h) +
geom_text(data=data.frame(x=0,y=h), aes(x, y), label=h, vjust=-1)
First, you can make the color mapping local to the points layer. Second, you do not have to put all the aesthetics into calls to aes(...) - only those you want mapped to columns of the dataset. Three, you can have layer-specific datasets using data=... in the calls to a specific geom_*.
You can use annotate instead:
plot2 + annotate(geom="text", label=h, x=1, y=h, vjust=-1)
Edit: Removed drawback that x is required, since that's also true of geom_text.

Get data associated to ggplot + stat_ecdf()

I like the stat_ecdf() feature part of ggplot2 package, which I find quite useful to explore a data series. However this is only visual, and I wonder if it is feasible - and if yes how - to get the associated table?
Please have a look to the following reproducible example
p <- ggplot(iris, aes_string(x = "Sepal.Length")) + stat_ecdf() # building of the cumulated chart
p
attributes(p) # chart attributes
p$data # data is iris dataset, not the serie used for displaying the chart
As #krfurlong showed me in this question, the layer_data function in ggplot2 can get you exactly what you're looking for without the need to recreate the data.
p <- ggplot(iris, aes_string(x = "Sepal.Length")) + stat_ecdf()
p.data <- layer_data(p)
The first column in p.data, "y", contains the ecdf values. "x" is the Sepal.Length values on the x-axis in your plot.
We can recreate the data:
#Recreate ecdf data
dat_ecdf <-
data.frame(x=unique(iris$Sepal.Length),
y=ecdf(iris$Sepal.Length)(unique(iris$Sepal.Length))*length(iris$Sepal.Length))
#rescale y to 0,1 range
dat_ecdf$y <-
scale(dat_ecdf$y,center=min(dat_ecdf$y),scale=diff(range(dat_ecdf$y)))
Below 2 plots should look the same:
#plot using new data
ggplot(dat_ecdf,aes(x,y)) +
geom_step() +
xlim(4,8)
#plot with built-in stat_ecdf
ggplot(iris, aes_string(x = "Sepal.Length")) +
stat_ecdf() +
xlim(4,8)

Resources