How to plot points on hexbin graph in R? - r

I have two sets of data that need to plot on the same graph. A set is very large (~ 10⁶) and I want to plot with hexbin, and the other set is very small (~ 10) and I want to plot the points. How do I plot points on the hexbin?
The closer to success I got was this:
bin = hexbin(x, y)
plot(bin)
pushViewport(dataViewport(x, y))
grid.points(x, y)
I appreciate any help :)

Assuming you are using the hexbin package...
library(hexbin)
library(grid)
# some data from the ?hexbin help
set.seed(101)
x <- rnorm(10000)
y <- rnorm(10000)
z <- w <- -3:3
# hexbin
bin <- hexbin(x, y)
# plot - look at str(p)
p <- plot(bin)
# push plot viewport
pushHexport(p$plot.vp)
# add points
grid.points(z, w, pch=16, gp=gpar(col="red"))
upViewport()

You can use the ggplot package for that task, see the code below, just replace the data.frame used in the data parameter for geom_point with the one for the points you want to plot.
library(ggplot2)
library(hexbin)
ggplot(diamonds, aes(carat, price)) + stat_binhex() + geom_point(data = diamonds[c(1,10,100,1000), ], aes(carat, price), size=10, color = 'red' )

Try this... it should work fine.
Just create a panel.function within your hexbinplot function:
hexbinplot(d.frame$X ~ d.frame$Y
,aspect=...,cex.title=...
,panel=function(x, y, ...){
panel.hexbinplot(x,y,...)
# panel.curve(...) # optional stuff
# panel.text(...) # optional stuff
panel.points(x=c(25,50),y=c(100,150),pch=20,cex=3.2)
}
)
take a look for instance at: How to add points to multi-panel Lattice graphics bwplot?

Related

R: plotting a line and horizontal barplot on the same plot

I am trying to combine a line plot and horizontal barplot on the same plot. The difficult part is that the barplot is actually counts of the y values of the line plot.
Can someone show me how this can be done using the example below ?
library(ggplot2)
library(plyr)
x <- c(1:100)
dff <- data.frame(x = x,y1 = sample(-500:500,size=length(x),replace=T), y2 = sample(3:20,size=length(x),replace=T))
counts <- ddply(dff, ~ y1, summarize, y2 = sum(y2))
# line plot
ggplot(data=dff) + geom_line(aes(x=x,y=y1))
# bar plot
ggplot() + geom_bar(data=counts,aes(x=y1,y=y2),stat="identity")
I believe what I need is presented in the pseudocode below but I do not know how to write it out in R.
Apologies. I actually meant the secondary x axis representing the value of counts for the barplot, while primary y-axis is the y1.
ggplot(data=dff) + geom_line(aes(x=x,y=y1)) + geom_bar(data=counts , aes(primary y axis = y1,secondary x axis =y2),stat="identity")
I just want the barplots to be plotted horizontally, so I tried the code below which flip both the line chart and barplot, which is also not I wanted.
ggplot(data=dff) +
geom_line(aes(x=x,y=y1)) +
geom_bar(data=counts,aes(x=y2,y=y1),stat="identity") + coord_flip()
You can combine two plots in ggplot like you want by specifying different data = arguments in each geom_ layer (and none in the original ggplot() call).
ggplot() +
geom_line(data=dff, aes(x=x,y=y1)) +
geom_bar(data=counts,aes(x=y1,y=y2),stat="identity")
The following plot is the result. However, since x and y1 have different ranges, are you sure this is what you want?
Perhaps you want y1 on the vertical axis for both plots. Something like this works:
ggplot() +
geom_line(data=dff, aes(x=y1 ,y = x)) +
geom_bar(data=counts,aes(x=y1,y=y2),stat="identity", color = "red") +
coord_flip()
Maybe you are looking for this. Ans based on your last code you look for a double axis. So using dplyr you can store the counts in the same dataframe and then plot all variables. Here the code:
library(ggplot2)
library(dplyr)
#Data
x <- c(1:100)
dff <- data.frame(x = x,y1 = sample(-500:500,size=length(x),replace=T), y2 = sample(3:20,size=length(x),replace=T))
#Code
dff %>% group_by(y1) %>% mutate(Counts=sum(y2)) -> dff2
#Scale factor
sf <- max(dff2$y1)/max(dff2$Counts)
# Plot
ggplot(data=dff2)+
geom_line(aes(x=x,y=y1),color='blue',size=1)+
geom_bar(stat='identity',aes(x=x,y=Counts*sf),fill='tomato',color='black')+
scale_y_continuous(name="y1", sec.axis = sec_axis(~./sf, name="Counts"))
Output:

Changing ellipse line type in fviz_cluster

I am using fviz_cluster from the to plot my kmeans results, obtained using kmeans function.
Below, I'm reporting the example present in the "factoextra" package guideline.
data("iris")
iris.scaled <- scale(iris[, -5])
km.res <- kmeans(iris.scaled, 3, nstart = 25)
fviz_cluster(km.res, data = iris[, -5], repel=TRUE,
ellipse.type = "convex")
Typing this command you will probably observe three clusters, each with a different colour. For each of those, however, I want to fix the same colour but varying line type of the ellipses. Do you know how to do it?
One solution is to use the data you get from fviz_cluster() in order to build your custom plot, by using ggplot.
Basically you just need to access the x,y coordinates of each new point, plus the info about the clusters, then you can recreate yourself the plot.
First save the data used for the plot from fviz_cluster(), then you can use chull() to find the convex hull per each cluster, then you can plot.
library(ggplot2)
library(factoextra)
# your example:
iris.scaled <- scale(iris[, -5])
km.res <- kmeans(iris.scaled, 3, nstart = 25)
p <- fviz_cluster(km.res, data = iris[, -5], repel=TRUE,
ellipse.type = "convex") # save to access $data
# save '$data'
data <- p$data # this is all you need
# calculate the convex hull using chull(), for each cluster
hull_data <- data %>%
group_by(cluster) %>%
slice(chull(x, y))
# plot: you can now customize this by using ggplot sintax
ggplot(data, aes(x, y)) + geom_point() +
geom_polygon(data = hull_data, alpha = 0.5, aes(fill=cluster, linetype=cluster))
Of course now you can change the axis labels, add a title and add labelling per each point if you need.
Here an example possibly closer to your needs:
ggplot(data, aes(x, y)) + geom_point() +
geom_polygon(data = hull_data, alpha=0.2, lwd=1, aes(color=cluster, linetype=cluster))
linetype changes the line per each cluster, you need to use lwd to make them thicker, also it's better to remove the fill argument and use color instead.

ggplot: clipping lines between facets

Say I have a plot like this:
# Load libraries
library(ggplot2)
library(grid)
# Load data
data(mtcars)
# Plot results
p <- ggplot(data = mtcars)
p <- p + geom_bar(aes(cyl))
p <- p + coord_flip()
p <- p + facet_wrap(~am)
print(p)
Now, I want to plot lines all the way across both facets where the bars are. I add this:
p <- p + geom_vline(aes(xintercept = cyl))
which adds the lines, but they don't cross both facets. So, I try to turn off clipping using this solution:
# Turn off clipping
gt <- ggplot_gtable(ggplot_build(p))
gt$layout$clip[gt$layout$name == "panel"] <- "off"
# Plot results
grid.draw(gt)
but that doesn't solve the problem: the lines are still clipped. So, I wondered if this is specific to geom_vline and tried approaches with geom_abline and geom_line (the latter with values across ±Inf), but the results are the same. In other posts, the clipping solution seems to work for text and points, but presumably in this case the lines are only defined within the limits of the figure. (I even tried gt$layout$clip <- "off" to switch off all possible clipping, but that didn't solve the problem.) Is there a workaround?
library(grid)
library(gtable)
# Starting from your plot `p`
gb <- ggplot_build(p)
g <- ggplot_gtable(gb)
# Get position of y-axis tick marks
ys <- gb$layout$panel_ranges[[1]][["y.major"]]
# Add segments at these positions
# subset `ys` if you only want to add a few
# have a look at g$layout for relevant `l` and `r` positions
g <- gtable_add_grob(g, segmentsGrob(y0=ys, y1=ys,
gp=gpar(col="red", lty="dashed")),
t = 7, l = 4, r=8)
grid.newpage()
grid.draw(g)
see ggplot, drawing multiple lines across facets for how to rescale values for more general plotting. ie
data2npc <- function(x, panel = 1L, axis = "x") {
range <- pb$layout$panel_ranges[[panel]][[paste0(axis,".range")]]
scales::rescale(c(range, x), c(0,1))[-c(1,2)]
}
start <- sapply(c(4,6,8), data2npc, panel=1, axis="y")
g <- gtable_add_grob(g, segmentsGrob(y0=start, y1=start),
t=7, r=4, l=8)

How to get the points inside of the ellipse in ggplot2?

I'm trying to identify the densest region in the plot. And I do this using stat_ellipse() in ggplot2. But I can not get the information (sum total, order number of each point and so on) of the points inside of the ellipse.
Seldom see the discussion about this problem. Is this possible?
For example:
ggplot(faithful, aes(waiting, eruptions))+
geom_point()+
stat_ellipse()
Here is Roman's suggestion implemented. The help for stat_ellipse says it uses a modified version of car::ellipse, so therefore I chose to extract the ellipse points from the ggplot object. That way it should always be correct (also if you change options in stat_ellipse).
# Load packages
library(ggplot2)
library(sp)
# Build the plot first
p <- ggplot(faithful, aes(waiting, eruptions)) +
geom_point() +
stat_ellipse()
# Extract components
build <- ggplot_build(p)$data
points <- build[[1]]
ell <- build[[2]]
# Find which points are inside the ellipse, and add this to the data
dat <- data.frame(
points[1:2],
in.ell = as.logical(point.in.polygon(points$x, points$y, ell$x, ell$y))
)
# Plot the result
ggplot(dat, aes(x, y)) +
geom_point(aes(col = in.ell)) +
stat_ellipse()

R: Density plot vs Density plot in ggplot2

I am trying to do some density plots in R. I originally used density plot but I changed to the density plot in ggplot2 because I visually prefer ggplot2.
So I did a density plot using the density plot function and did a density plot in ggplot2 (see below) but I found the plots were not identical. It looks like some of the y-values have been lost or dropped in the ggplot2 (right plot). Is there any particular reason for this? How can I make the ggplot identical to the destiny plot (left plot).
Code:
library(ggplot2)
library(grid)
par(mfrow=c(1,2))
# define function to create multi-plot setup (nrow, ncol)
vp.setup <- function(x,y){
grid.newpage()
pushViewport(viewport(layout = grid.layout(x,y)))
}
# define function to easily access layout (row, col)
vp.layout <- function(x,y){
viewport(layout.pos.row=x, layout.pos.col=y)
}
vp.setup(1,2)
dat <- read.table(textConnection("
low high
10611.0 14195.0
10759.0 14437.0
10807.0 14574.0
10714.0 14380.0
10768.0 14448.0
10601.0 14239.0
10579.0 14218.0
10806.0 14510.0
"), header=TRUE, sep="\t")
plot(density(dat$low))
dat.low = data.frame(low2 = c(dat$low), lines = rep(c("low")))
low_plot_gg = (ggplot(dat.low, aes(x = low2, fill = lines)) +
stat_density(aes(y = ..density..)) +
coord_cartesian(xlim = c(10300, 11000))
)
print(low_plot_gg, vp=vp.layout(1,2))
Based on some trial and error, it looks like you want
+ xlim(c(10300,11000))
rather than
+ coord_cartesian(xlim = c(10300, 11000))
coord_cartesian extends the limits of the plots but doesn't change what's drawn inside them at all ...
It's not a problem of lost values. The function plot(density()) proceed to smoothing for extreme value but it's not very accurate for your little dataset. For a bigger dataset the two plots will be the same.

Resources