Transformation doesn't transorm line segments in ggplot2 - r

I am trying to annotate a plot using some line segments. The x-axis is best displayed by a log transformation. I am using ggplot2 which handles transformations, which also means I shouldn't have to transform to location of my line segments. But when I apply a transformation, the line segments disappear (well - they don't "fit" into the plot window any more, due to the transformation). Any suggestions on how to get them to "follow" the transformation?
Minimal example:
library(ggplot2)
## Base plot:
B <- ggplot(data = data.frame(X = 10^(1:10), Y = 1:10),
aes(x = X, y = Y)) + geom_point()
## Generate segments:
S1 <- geom_segment(x = 1000, xend = 1000,
y = 3, yend = 5)
S2 <- geom_segment(x = 20, xend = 2.5e9,
y = 8, yend = 7)
## Generate transformation:
T <- scale_x_continuous(trans = "log")
Compare the following:
B # Basic plot
B + T # Basic plot, transformed
B + S1 + S2 # Basic, untransformed, with segments
B + S1 + S2 + T # Should be transformed with segments: segments missing
I know I could just transform the locations of the segments, but I'd really rather find a more ggplot2-style solution!
Hack solution:
S3 <- geom_segment(x = log(1000), xend = log(1000),
y = 3, yend = 5)
S4 <- geom_segment(x = log(20), xend = log(2.5e9),
y = 8, yend = 7)
B + S1 + S2
B + S3 + S4 + T #Fine, but not elegant.
Thanks!

Not sure if the plot I've shown is what you expect. But if it is, the explanation below is valid.
In ggplot2 transformations are performed on aesthetics. And the data is transformed first before plotting (or doing anything of the sort of fitting, ex: geom_smooth etc.. is done on transformed data).
So, if you want the log transformation to be reflected on your segment, you'll have to wrap around with aes as:
S1 <- geom_segment(aes(x=1000, xend=1000, y=3, yend=5))
S2 <- geom_segment(aes(x=20, xend=2.5e9, y=8, yend=7))
And by the way, your transformation should be log10, NOT log:
T <- scale_x_continuous(trans = "log10")
Now, if you plot B + S1 + S2 + T:
One step further: Compare your B+S1+S2+T and the one with S1 and S2 modified with mine using:
ggplot_build(B+S1+S2)$data # and
ggplot_build(B+S1+S2+T)$data
to see that the aesthetics get transformed accordingly.

Related

Remove data to the left and right of local minima

I have a lot of measurements where I get data that looks something like this:
# Generate example data
x <- 1:100
y <- 100*(1-exp(-0.3*x))
x2 <- 101:200
y2 <- rev(y)
df <- data.frame("x" = c(x, x2),
"y" = c(y, y2))
df$x <- df$x + 50
rm(x, x2, y, y2)
x <- 1:50
y <- 25.91818
x2 <- 251:300
y2 <- 25.91818
df2 <- data.frame("x" = c(x, x2),
"y" = c(y, y2))
rm(x, x2, y, y2)
df <- rbind(df, df2)
rm(df2)
If I plot this I can see that there are left-most and right-most local minima.
library(ggplot2)
p <- ggplot(df, aes(x,y))+
geom_line()+
geom_point(data = data.frame("x" = c(50, 250), "y" = c(25.91818, 25.91818)),
mapping = aes(x, y), colour = "red")+
scale_y_continuous(limits = c(0, 101))
p + annotate("text", label = "minimum 1", x = 50, y = 20) +
annotate("text", label = "minimum 2", x = 250, y = 20)
What I would like to do is trim those data that are to the left of minimum 1 and right of minimum 2. It's not super straightforward as there may also be local minima between those two points, because the real data doesn't look this ideal. I would also need to apply this process to many many samples, but I think this may be trivial because I could use e.g. dplyr and group_by().
I had some luck plotting the local minima using the ggpmisc package, but I'm not sure how I can use that to actually subset my data. Just for clarity I included the code to do so below, and with the real data it looks a little better:
library(ggpmisc)
p2 <- ggplot(df, aes(x, y))+
geom_line()+
ggpmisc::stat_peaks(col="red", span=3)
p2
I hope this is clear and I'm happy to clarify any questions. Thank you in advance.
You could do this using the following steps:
Sort your data according to its x co-ordinates
On your sorted data, find the diff of the y co-ordinates, which will be 0 (or close to 0) for the flat sections at either end (as well as any flat sections in between)
Starting from the left, find the first point where the diff is not zero (or at least is above a minimal threshold). Store this index as a variable called left
Starting from the right, find the first point where the diff is not zero (or at least is above a minimal threshold). Store this index as a variable called right
Subset your data frame so it only contains the data between rows left:right
So, in your example we would have:
# Define a minimal threshold above which we are not at the minimum line
minimal_change <- 1e-6
df <- df[order(df$x),] # Step 1
left <- which(diff(df$y) > minimal_change)[1] # Step 2
right <- nrow(df) - which(diff(rev(df$y)) > minimal_change)[1] + 1 # Step 3
df <- df[left:right, ] # Step 4
Now we can plot the result:
ggplot(df, aes(x, y)) +
geom_line()+
geom_point(data = data.frame("x" = c(50, 250), "y" = c(25.91818, 25.91818)),
mapping = aes(x, y), colour = "red") +
scale_y_continuous(limits = c(0, 101)) +
scale_x_continuous(limits = c(0, 300))

A ggplot2 equivalent of the lines() function in basic plot

For reasons I won't go into I need to plot a vertical normal curve on a blank ggplot2 graph. The following code gets it done as a series of points with x,y coordinates
dfBlank <- data.frame()
g <- ggplot(dfBlank) + xlim(0.58,1) + ylim(-0.2,113.2)
hdiLo <- 31.88
hdiHi <- 73.43
yComb <- seq(hdiLo, hdiHi, length = 75)
xVals <- 0.79 - (0.06*dnorm(yComb, 52.65, 10.67))/0.05
dfVertCurve <- data.frame(x = xVals, y = yComb)
g + geom_point(data = dfVertCurve, aes(x = x, y = y), size = 0.01)
The curve is clearly discernible but is a series of points. The lines() function in basic plot would turn these points into a smooth line.
Is there a ggplot2 equivalent?
I see two different ways to do it.
geom_segment
The first uses geom_segment to 'link' each point with its next one.
hdiLo <- 31.88
hdiHi <- 73.43
yComb <- seq(hdiLo, hdiHi, length = 75)
xVals <- 0.79 - (0.06*dnorm(yComb, 52.65, 10.67))/0.05
dfVertCurve <- data.frame(x = xVals, y = yComb)
library(ggplot2)
ggplot() +
xlim(0.58, 1) +
ylim(-0.2, 113.2) +
geom_segment(data = dfVertCurve, aes(x = x, xend = dplyr::lead(x), y = y, yend = dplyr::lead(y)), size = 0.01)
#> Warning: Removed 1 rows containing missing values (geom_segment).
As you can see it just link the points you created. The last point does not have a next one, so the last segment is removed (See the warning)
stat_function
The second one, which I think is better and more ggplotish, utilize stat_function().
library(ggplot2)
f = function(x) .79 - (.06 * dnorm(x, 52.65, 10.67)) / .05
hdiLo <- 31.88
hdiHi <- 73.43
yComb <- seq(hdiLo, hdiHi, length = 75)
ggplot() +
xlim(-0.2, 113.2) +
ylim(0.58, 1) +
stat_function(data = data.frame(yComb), fun = f) +
coord_flip()
This build a proper function (y = f(x)), plot it. Note that it is build on the X axis and then flipped. Because of this the xlim and ylim are inverted.

Add segments of circles to ggplot based on product of x & y

I want to add shaded areas to a chart to help people understand where bad, ok, and good points can fit.
Good = x*y>=.66
Ok = x*y>=.34
Bad = x*y<.34
Generating the right sequence of data to correctly apply the curved boundaries to the chart is proving tough.
What is the most elegant way to generate the curves?
Bonus Q: How would you do this to produce non-overlapping areas so that different colours could be used?
Updates
I've managed to do in a rather hacky way the drawing of the circle segments. I updated the MRE to use the revised segMaker function.
MRE
library(ggplot2)
pts<-seq(0,1,.02)
x<-sample(pts,50,replace=TRUE)
y<-sample(pts,50,replace=TRUE)
# What function will generate correct sequence of values as these are linear?
segMaker<-function(x,by){
# Original
# data.frame(x=c(seq(0,x,by),0)
# ,y=c(seq(x,0,-by),0)
# )
zero <- data.frame(x = 0, y = 0)
rs <- seq(0, pi, by)
xc <- x * cos(rs)
yc <- x * sin(rs)
gr <- data.frame(x = xc, y = yc)
gr <- rbind(gr[gr$x >= 0, ], zero)
return(gr)
}
firstSeg <-segMaker(.34,0.02)
secondSeg <-segMaker(.66,0.02)
thirdSeg <-segMaker(1,0.02)
ggplot(data.frame(x,y),aes(x,y, colour=x*y))+
geom_point() +
geom_polygon(data=firstSeg, fill="blue", alpha=.25)+
geom_polygon(data=secondSeg, fill="blue", alpha=.25)+
geom_polygon(data=thirdSeg, fill="blue", alpha=.25)
Current & desired shadings
You can create a data frame with the boundaries between each region and then use geom_ribbon to plot it. Here's an example using the conditions you supplied (which result in boundaries that are the reciprocal function, rather than circles, but the idea is the same, whichever function you use for the boundaries):
library(ggplot2)
# Fake data
pts<-seq(0,1,.02)
set.seed(19485)
x<-sample(pts,50,replace=TRUE)
y<-sample(pts,50,replace=TRUE)
df = data.frame(x,y)
# Region boundaries
x = seq(0.001,1.1,0.01)
bounds = data.frame(x, ymin=c(-100/x, 0.34/x, 0.66/x),
ymax=c(0.34/x, 0.66/x, 100/x),
g=rep(c("Bad","OK","Good"), each=length(x)))
bounds$g = factor(bounds$g, levels=c("Bad","OK","Good"))
ggplot() +
coord_cartesian(ylim=0:1, xlim=0:1) +
geom_ribbon(data=bounds, aes(x, ymin=ymin, ymax=ymax, fill=g), colour="grey50", lwd=0.2) +
geom_point(data=df, aes(x,y), colour="grey20") +
scale_fill_manual(values=hcl(c(15, 40, 240), 100, 80)) +
#scale_fill_manual(values=hcl(c(15, 40, 240), 100, 80, alpha=0.25)) + # If you want the fill colors to be transparent
labs(fill="") +
guides(fill=guide_legend(reverse=TRUE))
For circular boundaries, assuming we want boundaries at r=1/3 and r=2/3:
# Calculate y for circle, given r and x
cy = function(r, x) {sqrt(r^2 - x^2)}
n = 200
x = unlist(lapply(c(1/3,2/3,1), function(to) seq(0, to, len=n)))
bounds = data.frame(x, ymin = c(rep(0, n),
cy(1/3, seq(0, 1/3, len=n/2)), rep(0, n/2),
cy(2/3, seq(0, 2/3, len=2*n/3)), rep(0, n/3)),
ymax = c(cy(1/3, seq(0,1/3,len=n)),
cy(2/3, seq(0,2/3,len=n)),
rep(1,n)),
g=rep(c("Bad","OK","Good"), each=n))
bounds$g = factor(bounds$g, levels=c("Bad","OK","Good"))
If you can use a github package, ggforce adds geom_arc_bar():
# devtools::install_github('thomasp85/ggforce')
library(ggplot2)
library(ggforce)
pts<-seq(0,1,.02)
x<-sample(pts,50,replace=TRUE)
y<-sample(pts,50,replace=TRUE)
arcs <- data.frame(
x0 = 0,
y0 = 0,
start = 0,
end = pi / 2,
r0 = c(0, 1/3, 2/3),
r = c(1/3, 2/3, 1),
fill = c("bad", "ok", "good")
)
ggplot() +
geom_arc_bar(data = arcs,
aes(x0 = x0, y0 = y0, start = start, end = end, r0 = r0, r = r,
fill = fill), alpha = 0.6) +
geom_point(data = data.frame(x = x, y = y),
aes(x = x, y = y))
Based on #eipi10's great answer, to do the product component (basically ends up with the same thing) I did:
library(ggplot2)
library(data.table)
set.seed(19485)
pts <- seq(0, 1, .001)
x <- sample(pts, 50, replace = TRUE)
y <- sample(pts, 50, replace = TRUE)
df <- data.frame(x,y)
myRibbon<-CJ(pts,pts)
myRibbon[,prod:=V1 * V2]
myRibbon[,cat:=ifelse(prod<=1/3,"bad",
ifelse(prod<=2/3,"ok","good"))]
myRibbon<-myRibbon[
,.(ymin=min(V2),ymax=max(V2))
,.(cat,V1)]
ggplot() +
geom_ribbon(data=myRibbon
, aes(x=V1, ymin=ymin,ymax=ymax
, group=cat, fill=cat),
colour="grey90", lwd=0.2, alpha=.5)+
geom_point(data=df, aes(x,y), colour="grey20") +
theme_minimal()
This doesn't do anything fancy but works out for each value of x, what the smallest and largest values were that could give rise to a specific banding.
If I had just wanted arcs, the use of ggforce (#GregF) would be really great- it tucks away all the complexity.

topoplot in ggplot2 – 2D visualisation of e.g. EEG data

Can ggplot2 be used to produce a so-called topoplot (often used in neuroscience)?
Sample data:
label x y signal
1 R3 0.64924459 0.91228430 2.0261520
2 R4 0.78789621 0.78234410 1.7880972
3 R5 0.93169511 0.72980685 0.9170998
4 R6 0.48406513 0.82383895 3.1933129
Full sample data.
Rows represent individual electrodes. Columns x and y represent the projection into 2D space and the column signal is essentially the z-axis representing voltage measured at a given electrode.
stat_contour doesn't work, apparently due to unequal grid.
geom_density_2d only provides a density estimation of x and y.
geom_raster is one not fitted for this task or I must be using it incorrectly since it quickly runs out of memory.
Smoothing (like in the image on the right) and head contours (nose, ears) aren't necessary.
I want to avoid Matlab and transforming the data so that it fits this or that toolbox… Many thanks!
Update (26 January 2016)
The closest I've been able to get to my objective is via
library(colorRamps)
ggplot(channels, aes(x, y, z = signal)) + stat_summary_2d() + scale_fill_gradientn(colours=matlab.like(20))
which produces an image like this:
Update 2 (27 January 2016)
I've tried #alexforrence's approach with full data and this is the result:
It's a great start but there is a couple of issues:
The last call (ggplot()) takes about 40 seconds on an Intel i7 4790K while Matlab toolboxes manage to generate these almost instantly; my ‘emergency solution’ above takes about a second.
As you can see, the upper and lower border of the central part appear to be ‘sliced’ – I'm not sure what causes this but it could be the third issue.
I'm getting these warnings:
1: Removed 170235 rows containing non-finite values (stat_contour).
2: Removed 170235 rows containing non-finite values (stat_contour).
Update 3 (27 January 2016)
Comparison between two plots produced with different interp(xo, yo) and stat_contour(binwidth) values:
Ragged edges if one chooses low interp(xo, yo), in this case xo/yo = seq(0, 1, length = 100):
Here's a potential start:
First, we'll attach some packages. I'm using akima to do linear interpolation, though it looks like EEGLAB uses some sort of spherical interpolation here? (the data was a little sparse to try it).
library(ggplot2)
library(akima)
library(reshape2)
Next, reading in the data:
dat <- read.table(text = " label x y signal
1 R3 0.64924459 0.91228430 2.0261520
2 R4 0.78789621 0.78234410 1.7880972
3 R5 0.93169511 0.72980685 0.9170998
4 R6 0.48406513 0.82383895 3.1933129")
We'll interpolate the data, and stick that in a data frame.
datmat <- interp(dat$x, dat$y, dat$signal,
xo = seq(0, 1, length = 1000),
yo = seq(0, 1, length = 1000))
datmat2 <- melt(datmat$z)
names(datmat2) <- c('x', 'y', 'value')
datmat2[,1:2] <- datmat2[,1:2]/1000 # scale it back
I'm going to borrow from some previous answers. The circleFun below is from Draw a circle with ggplot2.
circleFun <- function(center = c(0,0),diameter = 1, npoints = 100){
r = diameter / 2
tt <- seq(0,2*pi,length.out = npoints)
xx <- center[1] + r * cos(tt)
yy <- center[2] + r * sin(tt)
return(data.frame(x = xx, y = yy))
}
circledat <- circleFun(c(.5, .5), 1, npoints = 100) # center on [.5, .5]
# ignore anything outside the circle
datmat2$incircle <- (datmat2$x - .5)^2 + (datmat2$y - .5)^2 < .5^2 # mark
datmat2 <- datmat2[datmat2$incircle,]
And I really liked the look of the contour plot in R plot filled.contour() output in ggpplot2, so we'll borrow that one.
ggplot(datmat2, aes(x, y, z = value)) +
geom_tile(aes(fill = value)) +
stat_contour(aes(fill = ..level..), geom = 'polygon', binwidth = 0.01) +
geom_contour(colour = 'white', alpha = 0.5) +
scale_fill_distiller(palette = "Spectral", na.value = NA) +
geom_path(data = circledat, aes(x, y, z = NULL)) +
# draw the nose (haven't drawn ears yet)
geom_line(data = data.frame(x = c(0.45, 0.5, .55), y = c(1, 1.05, 1)),
aes(x, y, z = NULL)) +
# add points for the electrodes
geom_point(data = dat, aes(x, y, z = NULL, fill = NULL),
shape = 21, colour = 'black', fill = 'white', size = 2) +
theme_bw()
With improvements mentioned in the comments (setting extrap = TRUE and linear = FALSE in the interp call to fill in gaps and do a spline smoothing, respectively, and removing NAs before plotting), we get:
mgcv can do spherical splines. This replaces akima (the chunk containing interp() isn't necessary).
library(mgcv)
spl1 <- gam(signal ~ s(x, y, bs = 'sos'), data = dat)
# fine grid, coarser is faster
datmat2 <- data.frame(expand.grid(x = seq(0, 1, 0.001), y = seq(0, 1, 0.001)))
resp <- predict(spl1, datmat2, type = "response")
datmat2$value <- resp

geom_jitter removes different number of points due to missing values for plot each time its run

geom_jitter in R ggplot seems to remove a different number of points each time I plot data. I suspect this is due to overplotting (stacked points)? e.g., if I create the data frame once, and then run the ggplot command multiple times, I will get varying numbers of points removed due to missing data (ranging from 0 to 1+). Is there a way to ensure a consistent number of missing points (or none)? I tried tinkering with the size, and jitter width/height, to no avail. thanks!
d <- data.frame(a = rnorm(n = 100, mean = 0, sd = 1), b = rnorm(n = 100, mean = 0, sd = 1))
ggplot(d, aes(a,b)) + geom_point(position=position_jitter(width=0.3, height=.3), size=2) + theme(panel.background=element_blank()) + scale_x_continuous(limits=c(-3, 3)) + scale_y_continuous(limits=c(-3, 3))
The jitter is pushing the points out of the ranges you specify, and the noise is calculated with each run. Try jittering yourself, so it won't change every time, or remove the range constraints.
set.seed(0)
d <- data.frame(a = rep(-2:2, each=20), b=rnorm(100))
## Specify your own jitter: 0.1 in width, 1 in height in this example
d <- d + rnorm(nrow(d)*2, 0, sd=rep(c(0.1, 1), each=nrow(d)))
## Always 4 rows removed, unless you rejitter
ggplot(d, aes(a, b)) +
geom_point(size=2) +
theme(panel.background=element_blank()) +
scale_x_continuous(limits=c(-3,3)) +
scale_y_continuous(limits=c(-3,3))
Edit
Actually much simpler, just set.seed prior to running what you have :)
set.seed(0)
ggplot(d, aes(a,b)) +
geom_point(position=position_jitter(width=0.3, height=.3), size=2) +
theme(panel.background=element_blank()) + scale_x_continuous(limits=c(-3, 3)) +
scale_y_continuous(limits=c(-3, 3))
Another option is to not use the limits argument of scale_x_continuous. Instead, use the xlim and ylim arguments of coord_cartesian. This is the code that's meant for zooming into a portion of the plot. The limits argument in the x and y axis scales actually subsets the data that's to be plotted. Usually this makes little difference unless you're talking about statistical summaries that include data not visible on the plot.
Note: you won't get the warnings when your data points fall out of the graph.
d <- data.frame(a = rnorm(n = 100, mean = 0, sd = 1),
b = rnorm(n = 100, mean = 0, sd = 1))
ggplot(d, aes(a,b)) +
geom_point(position=position_jitter(width=0.3, height=.3), size=2) +
theme(panel.background=element_blank()) +
coord_cartesian(xlim=c(-3,3), ylim=c(-3,3))
Another, lesser known, option is to change the way scales handle their bounds, by setting the out of bounds (oob) argument.
This is not really my idea, but very much inspired by user axeman in this very similar thread.
library(ggplot2)
set.seed(0)
d <- data.frame(a = rnorm(n = 100, mean = 0, sd = 1), b = rnorm(n = 100, mean = 0, sd = 1))
ggplot(d, aes(a,b)) +
geom_point(position=position_jitter(width=0.3, height=.3), size=2) +
theme(panel.background=element_blank()) +
scale_x_continuous(limits=c(-3, 3), oob = scales::squish) +
scale_y_continuous(limits=c(-3, 3), oob = scales::squish)
Created on 2021-04-27 by the reprex package (v2.0.0)

Resources