How to get the points inside of the ellipse in ggplot2? - r

I'm trying to identify the densest region in the plot. And I do this using stat_ellipse() in ggplot2. But I can not get the information (sum total, order number of each point and so on) of the points inside of the ellipse.
Seldom see the discussion about this problem. Is this possible?
For example:
ggplot(faithful, aes(waiting, eruptions))+
geom_point()+
stat_ellipse()

Here is Roman's suggestion implemented. The help for stat_ellipse says it uses a modified version of car::ellipse, so therefore I chose to extract the ellipse points from the ggplot object. That way it should always be correct (also if you change options in stat_ellipse).
# Load packages
library(ggplot2)
library(sp)
# Build the plot first
p <- ggplot(faithful, aes(waiting, eruptions)) +
geom_point() +
stat_ellipse()
# Extract components
build <- ggplot_build(p)$data
points <- build[[1]]
ell <- build[[2]]
# Find which points are inside the ellipse, and add this to the data
dat <- data.frame(
points[1:2],
in.ell = as.logical(point.in.polygon(points$x, points$y, ell$x, ell$y))
)
# Plot the result
ggplot(dat, aes(x, y)) +
geom_point(aes(col = in.ell)) +
stat_ellipse()

Related

Add y-axis-values to ggplot?

Say I have a ggplot, with a continuous variable allocated to the y-axis and a geom_point layer. How would I add the y-axis-value corresponding to each point to the ggplot as an additional layer?
EDIT:
For clarification: Next to each point, I'd like to see a number. That number should be the value of the y-variable that corresponds to the respective point.
This looks weird but seems to be in line with your question
### Library
library(ggplot2)
### Initiating data
set.seed(2)
df <- data.frame(y=rnorm(10),
x=rnorm(10))
### Display plot
ggplot(df, aes(x, y)) +
geom_point() +
scale_y_continuous(sec.axis=sec_axis(~.,
breaks=round(df$y, 2),
labels=round(df$y, 2), name="Additional y axis"))
EDIT
Based on your edit, please find the code to put y values next to each dot.
### Library
library(ggplot2)
### Initiating data
set.seed(2)
df <- data.frame(y=rnorm(10),
x=rnorm(10))
### Display plot
ggplot(df, aes(x, y)) +
geom_point() +
geom_text(x=df$x+0.1, y=df$y, label=round(df$y, 2))

modifying ggplot objects after creation

Is there a preferred way to modify ggplot objects after creation?
For example I recommend my students to save the r object together with the pdf file for later changes...
library(ggplot2)
graph <-
ggplot(mtcars, aes(x=mpg, y=qsec, fill=cyl)) +
geom_point() +
geom_text(aes(label=rownames(mtcars))) +
xlab('miles per galon') +
ggtitle('my title')
ggsave('test.pdf', graph)
save(graph, file='graph.RData')
So new, in case they have to change title or labels or sometimes other things, they can easily load the object and change simple things.
load('graph.RData')
print(graph)
graph +
ggtitle('better title') +
ylab('seconds per quarter mile')
What do I have to do for example to change the colour to discrete scale? In the original plot I would wrap the y in as.factor. But is there a way to do it afterwards?
Or is there a better way on modifying the objects, when the data is gone. Would love to get some advice.
You could use ggplot_build() to alter the plot without the code or data:
Example plot:
data("iris")
p <- ggplot(iris) +
aes(x = Sepal.Length, y = Sepal.Width, colour = Species) +
geom_point()
Colours are respective to Species.
Disassemble the plot using ggplot_build():
q <- ggplot_build(p)
Take a look at the object q to see what is happening here.
To change the colour of the point, you can alter the respective table in q:
q$data[[1]]$colour <- "black"
Reassemble the plot using ggplot_gtable():
q <- ggplot_gtable(q)
And plot it:
plot(q)
Now, the points are black.

ggplot scale transformation acts differently on points and functions

I'm trying to plot a distribution CDF using R and ggplot2. However, I am finding difficulties in plotting the CDF function after I transform the Y axis to obtain a straight line.
This kind of plot is frequently used in Gumbel paper plots, but here I'll use as example the normal distribution.
I generate the data, and plot the cumulative density function of the data along with the function. They fit well. However, when I apply an Y axis transformation, they don't fit anymore.
sim <- rnorm(100) #Simulate some data
sim <- sort(sim) #Sort it
cdf <- seq(0,1,length.out=length(sim)) #Compute data CDF
df <- data.frame(x=sim, y=cdf) #Build data.frame
library(scales)
library(ggplot2)
#Now plot!
gg <- ggplot(df, aes(x=x, y=y)) +
geom_point() +
stat_function(fun = pnorm, colour="red")
gg
And the output should be something on the lines of:
Good!
Now I try to transform the Y axis according to the distribution used.
#Apply transformation
gg + scale_y_continuous(trans=probability_trans("norm"))
And the result is:
The points are transformed correctly (they lie on a straight line), but the function is not!
However, everything seems to work fine if I do like this, calculating the CDF with ggplot:
ggplot(data.frame(x=sim), aes(x=x)) +
stat_ecdf(geom = "point") +
stat_function(fun="pnorm", colour="red") +
scale_y_continuous(trans=probability_trans("norm"))
The result is OK:
Why is this happening? Why doesn't calculating the CDF manually work with scale transformations?
This works:
gg <- ggplot(df, aes(x=x, y=y)) +
geom_point() +
stat_function(fun ="pnorm", colour="red", inherit.aes = FALSE) +
scale_y_continuous(trans=probability_trans("norm"))
gg
Possible explanation:
Documentation States:
inherit.aes If FALSE, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn't inherit behaviour from the default plot specification, e.g. borders.
My guess:
As scale_y_continuous changes the aesthetics of the main plot, we need to turn off the default inherit.aes=TRUE. It seems inherit.aes=TRUE in stat_function picks its aesthetics from the first layer of the plot, and so the scale transformation does not impact unless specifically chosen to.

3-variables plotting heatmap ggplot2

I'm currently working on a very simple data.frame, containing three columns:
x contains x-coordinates of a set of points,
y contains y-coordinates of the set of points, and
weight contains a value associated to each point;
Now, working in ggplot2 I seem to be able to plot contour levels for these data, but i can't manage to find a way to fill the plot according to the variable weight. Here's the code that I used:
ggplot(df, aes(x,y, fill=weight)) +
geom_density_2d() +
coord_fixed(ratio = 1)
You can see that there's no filling whatsoever, sadly.
I've been trying for three days now, and I'm starting to get depressed.
Specifying fill=weight and/or color = weight in the general ggplot call, resulted in nothing. I've tried to use different geoms (tile, raster, polygon...), still nothing. Tried to specify the aes directly into the geom layer, also didn't work.
Tried to convert the object as a ppp but ggplot can't handle them, and also using base-R plotting didn't work. I have honestly no idea of what's wrong!
I'm attaching the first 10 points' data, which is spaced on an irregular grid:
x = c(-0.13397460,-0.31698730,-0.13397460,0.13397460,-0.28867513,-0.13397460,-0.31698730,-0.13397460,-0.28867513,-0.26794919)
y = c(-0.5000000,-0.6830127,-0.5000000,-0.2320508,-0.6547005,-0.5000000,-0.6830127,-0.5000000,-0.6547005,0.0000000)
weight = c(4.799250e-01,5.500250e-01,4.799250e-01,-2.130287e+12,5.798250e-01,4.799250e-01,5.500250e-01,4.799250e-01,5.798250e-01,6.618956e-01)
any advise? The desired output would be something along these lines:
click
Thank you in advance.
From your description geom_density doesn't sound right.
You could try geom_raster:
ggplot(df, aes(x,y, fill = weight)) +
geom_raster() +
coord_fixed(ratio = 1) +
scale_fill_gradientn(colours = rev(rainbow(7)) # colourmap
Here is a second-best using fill=..level... There is a good explanation on ..level.. here.
# load libraries
library(ggplot2)
library(RColorBrewer)
library(ggthemes)
# build your data.frame
df <- data.frame(x=x, y=y, weight=weight)
# build color Palette
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")), space="Lab")
# Plot
ggplot(df, aes(x,y, fill=..level..) ) +
stat_density_2d( bins=11, geom = "polygon") +
scale_fill_gradientn(colours = myPalette(11)) +
theme_minimal() +
coord_fixed(ratio = 1)

Connect points within x values for ggplot2?

I am plotting a series of point that are grouped by two factors. I would like to add lines within one group across the other and within the x value (across the position-dodge distance) to visually highlight trends within the data.
geom_line(), geom_segment(), and geom_path() all seem to plot only to the actual x value rather than the position-dodge place of the data points. Is there a way to add a line connecting points within the x value?
Here is a structurally analogous sample:
# Create a sample data set
d <- data.frame(expand.grid(x=letters[1:3],
g1=factor(1:2),
g2=factor(1:2)),
y=rnorm(12))
# Load ggplot2
library(ggplot2)
# Define position dodge
pd <- position_dodge(0.75)
# Define the plot
p <- ggplot(d, aes(x=x, y=y, colour=g1, group=interaction(g1,g2))) +
geom_point(aes(shape = factor(g2)), position=pd) +
geom_line()
# Look at the figure
p
# How to plot the line instead across g1, within g2, and within x?
Simply trying to close this question (#Axeman please feel free to take over my answer).
p <- ggplot(d, aes(x=x, y=y, colour=g1, group=interaction(g1,g2))) +
geom_point(aes(shape = factor(g2)), position=pd) +
geom_line(position = pd)
# Look at the figure
p

Resources