How to get a really periodic polar surface plot with ggplot - r

Sample data:
mydata="theta,rho,value
0,0.8400000,0.0000000
40,0.8400000,0.4938922
80,0.8400000,0.7581434
120,0.8400000,0.6675656
160,0.8400000,0.2616592
200,0.8400000,-0.2616592
240,0.8400000,-0.6675656
280,0.8400000,-0.7581434
320,0.8400000,-0.4938922
360,0.8400000,0.0000000
0,0.8577778,0.0000000
40,0.8577778,0.5152213
80,0.8577778,0.7908852
120,0.8577778,0.6963957
160,0.8577778,0.2729566
200,0.8577778,-0.2729566
240,0.8577778,-0.6963957
280,0.8577778,-0.7908852
320,0.8577778,-0.5152213
360,0.8577778,0.0000000
0,0.8755556,0.0000000
40,0.8755556,0.5367990
80,0.8755556,0.8240077
120,0.8755556,0.7255612
160,0.8755556,0.2843886
200,0.8755556,-0.2843886
240,0.8755556,-0.7255612
280,0.8755556,-0.8240077
320,0.8755556,-0.5367990
360,0.8755556,0.0000000
0,0.8933333,0.0000000
40,0.8933333,0.5588192
80,0.8933333,0.8578097
120,0.8933333,0.7553246
160,0.8933333,0.2960542
200,0.8933333,-0.2960542
240,0.8933333,-0.7553246
280,0.8933333,-0.8578097
320,0.8933333,-0.5588192
360,0.8933333,0.0000000
0,0.9111111,0.0000000
40,0.9111111,0.5812822
80,0.9111111,0.8922910
120,0.9111111,0.7856862
160,0.9111111,0.3079544
200,0.9111111,-0.3079544
240,0.9111111,-0.7856862
280,0.9111111,-0.8922910
320,0.9111111,-0.5812822
360,0.9111111,0.0000000
0,0.9288889,0.0000000
40,0.9288889,0.6041876
80,0.9288889,0.9274519
120,0.9288889,0.8166465
160,0.9288889,0.3200901
200,0.9288889,-0.3200901
240,0.9288889,-0.8166465
280,0.9288889,-0.9274519
320,0.9288889,-0.6041876
360,0.9288889,0.0000000
0,0.9466667,0.0000000
40,0.9466667,0.6275358
80,0.9466667,0.9632921
120,0.9466667,0.8482046
160,0.9466667,0.3324593
200,0.9466667,-0.3324593
240,0.9466667,-0.8482046
280,0.9466667,-0.9632921
320,0.9466667,-0.6275358
360,0.9466667,0.0000000
0,0.9644444,0.0000000
40,0.9644444,0.6512897
80,0.9644444,0.9997554
120,0.9644444,0.8803115
160,0.9644444,0.3450427
200,0.9644444,-0.3450427
240,0.9644444,-0.8803115
280,0.9644444,-0.9997554
320,0.9644444,-0.6512897
360,0.9644444,0.0000000
0,0.9822222,0.0000000
40,0.9822222,0.6751215
80,0.9822222,1.0363380
120,0.9822222,0.9125230
160,0.9822222,0.3576658
200,0.9822222,-0.3576658
240,0.9822222,-0.9125230
280,0.9822222,-1.0363380
320,0.9822222,-0.6751215
360,0.9822222,0.0000000
0,1.0000000,0.0000000
40,1.0000000,0.6989533
80,1.0000000,1.0729200
120,1.0000000,0.9447346
160,1.0000000,0.3702890
200,1.0000000,-0.3702890
240,1.0000000,-0.9447346
280,1.0000000,-1.0729200
320,1.0000000,-0.6989533
360,1.0000000,0.0000000"
read in a data frame:
foobar <- read.csv(text = mydata)
You can check (if you really want to!) that the data are periodic in the theta direction, i.e., for each given rho, the point at theta=0 and theta=360 are precisely the same. I would like to plot a nice polar surface plot, in other words an annulus colored according to value. I tried the following:
library(viridis) # just because I very much like viridis: if you don't want to install it, just comment this line and uncomment the scale_fill_distiller line
library(ggplot2)
p <- ggplot(data = foobar, aes(x = theta, y = rho, fill = value)) +
geom_tile() +
coord_polar(theta = "x") +
scale_x_continuous(breaks = seq(0, 360, by = 45), limits=c(0,360)) +
scale_y_continuous(limits = c(0, 1)) +
# scale_fill_distiller(palette = "Oranges")
scale_fill_viridis(option = "plasma")
I'm getting:
Yuck! Why the nasty hole in the annulus? If I generate a foobar data frame with more rows (more theta and rho values) the hole gets smaller. This isn't a viable solutione, both because computing data at more rho/theta values is costly and time-consuming, and both because even with 100x100=10^4 rows I still get a hole. Also, with a bigger dataframe, ggplot takes forever to render the plot: the combination of geom_tile and coord_polar is incredibly inefficient. Isn't there a way to get a nice-looking polar plot without unnecessarily wasting memory & CPU time?

Edit: all value of data for theta=360 were removed (repeat from the values of theta=0)
ggplot(data = foobar, aes(x = theta, y = rho, fill = value)) +
geom_tile() +
coord_polar(theta = "x",start=-pi/9) +
scale_y_continuous(limits = c(0, 1))+
scale_x_continuous(breaks = seq(0, 360, by = 45))
I just removed limits from scale_x_continuous
That gives me:

Related

How to stop ggplot line plot adding fill

I am producing a ggplot which looks at a curve in a dataset. When I build the plot, ggplot is automatically adding fill to data which is on the negative side of the x axis. Script and plot shown below.
ggplot(df, aes(x = Var1, y = Var2)) +
geom_line() +
geom_vline(xintercept = 0) +
geom_hline(yintercept = Var2[1])
Using base R, I am able to get the plot shown below which is how it should look.
plot(x = df$Var1, y = df$Var2, type = "l",
xlab = "Var1", ylab = "Var2")
abline(v = 0)
abline(h = df$Var2[1])
If anyone could help identify why I might be getting the automatic fill and how I could make it stop, I would be very appreciative. I would like to make this work in ggplot so I can later animate the line as it is a time series that can be used to compare between other datasets from the same source.
Can add data if necessary. Data set is 1561 obs long however. Thanks in advance.
I guess you should try
ggplot(df, aes(x = Var1, y = Var2)) +
geom_path() +
geom_vline(xintercept = 0) +
geom_hline(yintercept = Var2[1])
instead. The geom_line()-function connects the points in order of the variable on the x-axis.
Take a look at this example
dt <- data.frame(
x = c(seq(-pi/2,3*pi,0.001),seq(-pi/2,3*pi,0.001)),
y = c(sin(seq(-pi/2,3*pi,0.001)), cos(seq(-pi/2,3*pi,0.001)))
)
ggplot(dt, aes(x,y)) + geom_line()
The two points with x-coordinate -pi/2 will be connected first, creating a vertical black line. Next x = -pi/2 + 0.001 will be processed and so on. The x values will be processed in order.
Therefore you should use geom_path() to get the desired result
dt <- data.frame(
x = c(seq(-pi/2,3*pi,0.001),seq(-pi/2,3*pi,0.001)),
y = c(sin(seq(-pi/2,3*pi,0.001)), cos(seq(-pi/2,3*pi,0.001)))
)
ggplot(dt, aes(x,y)) + geom_path()

Transforming the y-axis without changing raw data in ggplot2

I have a question about how to transform the y-axis in ggplot2. My plot now has two lines and a scatter plot. For the scatter plot, I am very interested in the area around zero. Is there a possible way to enlarge the space between 0% and 5% and narrow the space between 20% and 30%?
I have tried to use coord_trans(y = "log10") to transform into a log form. But in this case, I have a lot of negative values, so if I want to use sqrt or log, the negative values will be removed. Do you have any suggestions?
Example of data points:
df1 = data.frame(y = runif(200,min = -1, max = 1))
df1 = data.frame( x= seq(1:200), y = df1[order(abs(df1$y)),])
ggplot(df1) +
geom_point(colour = "black",aes(x,y) ,size = 0.1)
I want to have more space between 0% and 5 % and less space between 5% and 30%.
I have tried to use trans_new() to transform the axes.
eps <- 1e-8
tn <- trans_new("logpeps",
function(x) (x+eps)^(3),
function(y) ((y)^(1/3) ),
domain=c(- Inf, Inf)
)
ggplot(df1)+ geom_point(colour = "black",aes(x,y) ,size = 0.1) +
# xlab("Observations sorted by PD in v3.1") + ylab("Absolute PD difference ") +
# ggtitle("Absolute PD for RiskCalc v4.0 relative to v3.1") +
scale_x_continuous(breaks = seq(0, round(rownum/1000)*1000, by = round(rownum/100)*10)) +
scale_y_continuous(limits = c(-yrange,yrange),breaks = c(-breaksY,breaksY),
sec.axis = sec_axis(~.,breaks = c(-breaksY[2:length(breaksY)],breaksY), labels = scales:: percent
)) +
# geom_line(data = df, aes(x,y[,3], colour = "blue"),size = 1) +
# geom_line(data = ds,aes(xval, yval,colour = "red"),size = 1) +
coord_trans(y = tn) +
scale_color_discrete(name = element_blank())
But it compresses the plot to the center, which is opposite to what I want. Then I try to use y = y^3, but it shows an
ERROR: zero_range(range)
Try a cube root transform on the y values:
aes(y=yVariable^(1/3))
or use trans_new() to define a new transformation (such as cube root, with pleasing breaks and labels).
A couple thoughts:
You can remove the empty edges of the plot like so:
scale_y_continuous(expand = c(0,0))
If you want to try the log transformation, just do:
scale_y_log10()
If you want to focus the window:
scale_y_continuous(limits=c(-.15,.15), expand=c(0,0))
Also consider adding theme_bw() for a cleaner look

ggplot2: Why is it displaying the wrong values when set to log10 axis?

I'm using stat_summary to display the mean and, based off my calculations, "type1, G-" should have a mean of ~10^7.3. And that's the value I get from plotting it without a log10 axis. But when I add in the log10 axis, suddenly "type1, G-" shows a value of 10^6.5.
What's going on?
#Data
Type = rep(c("type1", "type2"), each = 6)
Gen = rep(rep(c("G-", "G+"), each = 3), 2)
A = c(4.98E+05, 5.09E+05, 1.03E+05, 3.08E+05, 5.07E+03, 4.22E+04, 6.52E+05, 2.51E+04, 8.66E+05, 8.10E+04, 6.50E+06, 1.64E+06)
B = c(6.76E+07, 3.25E+07, 1.11E+07, 2.34E+06, 4.10E+04, 1.20E+06, 7.50E+07, 1.65E+05, 9.52E+06, 5.92E+06, 3.11E+08, 1.93E+08)
df = melt(data.frame(Type, Gen, A, B))
#Correct, non-log10 version ("type1 G-" has a value over 1e+07)
ggplot(data = df, aes(x =Type,y = value)) +
stat_summary(fun.y="mean",geom="bar",position="dodge",aes(fill=Gen))+
scale_x_discrete(limits=c("type1"))+
coord_cartesian(ylim=c(10^7,10^7.5))
#Incorrect, log10 version ("type1 G-" has a value under 1e+07)
ggplot(data = df, aes(x =Type,y = value)) +
stat_summary(fun.y="mean",geom="bar",position="dodge",aes(fill=Gen))+
scale_y_log10()
You want coord_trans. As its documentation says:
# The difference between transforming the scales and
# transforming the coordinate system is that scale
# transformation occurs BEFORE statistics, and coordinate
# transformation afterwards.
However, you cannot make a barplot with this, since bars start at 0 and log10(0) is not defined. But barplots are usually not a good visualization anyway.
ggplot(data = df, aes(x =Type,y = value)) +
stat_summary(fun.y="mean",geom="point",position="identity",aes(color=Gen))+
coord_trans(y = "log10", limy = c(1e5, 1e8)) +
scale_y_continuous(breaks = 10^(5:8))
Obviously you should plot some kind of uncertainty information. I'd recommend a boxplot.

ggsave with arrangeGrob fails for large plots (+1 million observations) [duplicate]

I am trying to plot two variables where N=700K. The problem is that there is too much overlap, so that the plot becomes mostly a solid block of black. Is there any way of having a grayscale "cloud" where the darkness of the plot is a function of the number of points in an region? In other words, instead of showing individual points, I want the plot to be a "cloud", with the more the number of points in a region, the darker that region.
One way to deal with this is with alpha blending, which makes each point slightly transparent. So regions appear darker that have more point plotted on them.
This is easy to do in ggplot2:
df <- data.frame(x = rnorm(5000),y=rnorm(5000))
ggplot(df,aes(x=x,y=y)) + geom_point(alpha = 0.3)
Another convenient way to deal with this is (and probably more appropriate for the number of points you have) is hexagonal binning:
ggplot(df,aes(x=x,y=y)) + stat_binhex()
And there is also regular old rectangular binning (image omitted), which is more like your traditional heatmap:
ggplot(df,aes(x=x,y=y)) + geom_bin2d()
An overview of several good options in ggplot2:
library(ggplot2)
x <- rnorm(n = 10000)
y <- rnorm(n = 10000, sd=2) + x
df <- data.frame(x, y)
Option A: transparent points
o1 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.05)
Option B: add density contours
o2 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.05) +
geom_density_2d()
Option C: add filled density contours
(Note that the points distort the perception of the colors underneath, may be better without points.)
o3 <- ggplot(df, aes(x, y)) +
stat_density_2d(aes(fill = stat(level)), geom = 'polygon') +
scale_fill_viridis_c(name = "density") +
geom_point(shape = '.')
Option D: density heatmap
(Same note as C.)
o4 <- ggplot(df, aes(x, y)) +
stat_density_2d(aes(fill = stat(density)), geom = 'raster', contour = FALSE) +
scale_fill_viridis_c() +
coord_cartesian(expand = FALSE) +
geom_point(shape = '.', col = 'white')
Option E: hexbins
(Same note as C.)
o5 <- ggplot(df, aes(x, y)) +
geom_hex() +
scale_fill_viridis_c() +
geom_point(shape = '.', col = 'white')
Option F: rugs
Possibly my favorite option. Not quite as flashy, but visually simple and simple to understand. Very effective in many cases.
o6 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.1) +
geom_rug(alpha = 0.01)
Combine in one figure:
cowplot::plot_grid(
o1, o2, o3, o4, o5, o6,
ncol = 2, labels = 'AUTO', align = 'v', axis = 'lr'
)
You can also have a look at the ggsubplot package. This package implements features which were presented by Hadley Wickham back in 2011 (http://blog.revolutionanalytics.com/2011/10/ggplot2-for-big-data.html).
(In the following, I include the "points"-layer for illustration purposes.)
library(ggplot2)
library(ggsubplot)
# Make up some data
set.seed(955)
dat <- data.frame(cond = rep(c("A", "B"), each=5000),
xvar = c(rep(1:20,250) + rnorm(5000,sd=5),rep(16:35,250) + rnorm(5000,sd=5)),
yvar = c(rep(1:20,250) + rnorm(5000,sd=5),rep(16:35,250) + rnorm(5000,sd=5)))
# Scatterplot with subplots (simple)
ggplot(dat, aes(x=xvar, y=yvar)) +
geom_point(shape=1) +
geom_subplot2d(aes(xvar, yvar,
subplot = geom_bar(aes(rep("dummy", length(xvar)), ..count..))), bins = c(15,15), ref = NULL, width = rel(0.8), ply.aes = FALSE)
However, this features rocks if you have a third variable to control for.
# Scatterplot with subplots (including a third variable)
ggplot(dat, aes(x=xvar, y=yvar)) +
geom_point(shape=1, aes(color = factor(cond))) +
geom_subplot2d(aes(xvar, yvar,
subplot = geom_bar(aes(cond, ..count.., fill = cond))),
bins = c(15,15), ref = NULL, width = rel(0.8), ply.aes = FALSE)
Or another approach would be to use smoothScatter():
smoothScatter(dat[2:3])
Alpha blending is easy to do with base graphics as well.
df <- data.frame(x = rnorm(5000),y=rnorm(5000))
with(df, plot(x, y, col="#00000033"))
The first six numbers after the # are the color in RGB hex and the last two are the opacity, again in hex, so 33 ~ 3/16th opaque.
You can also use density contour lines (ggplot2):
df <- data.frame(x = rnorm(15000),y=rnorm(15000))
ggplot(df,aes(x=x,y=y)) + geom_point() + geom_density2d()
Or combine density contours with alpha blending:
ggplot(df,aes(x=x,y=y)) +
geom_point(colour="blue", alpha=0.2) +
geom_density2d(colour="black")
You may find useful the hexbin package. From the help page of hexbinplot:
library(hexbin)
mixdata <- data.frame(x = c(rnorm(5000),rnorm(5000,4,1.5)),
y = c(rnorm(5000),rnorm(5000,2,3)),
a = gl(2, 5000))
hexbinplot(y ~ x | a, mixdata)
geom_pointdenisty from the ggpointdensity package (recently developed by Lukas Kremer and Simon Anders (2019)) allows you visualize density and individual data points at the same time:
library(ggplot2)
# install.packages("ggpointdensity")
library(ggpointdensity)
df <- data.frame(x = rnorm(5000), y = rnorm(5000))
ggplot(df, aes(x=x, y=y)) + geom_pointdensity() + scale_color_viridis_c()
My favorite method for plotting this type of data is the one described in this question - a scatter-density plot. The idea is to do a scatter-plot but to colour the points by their density (roughly speaking, the amount of overlap in that area).
It simultaneously:
clearly shows the location of outliers, and
reveals any structure in the dense area of the plot.
Here is the result from the top answer to the linked question:

R: prevent break in line showing time series data using ggplot geom_line

Using ggplot2 I want to draw a line that changes colour after a certain date. I expected this to be be simple, but I get a break in the line at the point the colour changes. Initially I thought this was a problem with group (as per this question; this other question also looked relevant but wasn't quite what I needed). Having messed around with the group aesthetic for 30 minutes I can't fix it so if anybody can point out the obvious mistake...
Code:
require(ggplot2)
set.seed(1111)
mydf <- data.frame(mydate = seq(as.Date('2013-01-01'), by = 'day', length.out = 10),
y = runif(10, 100, 200))
mydf$cond <- ifelse(mydf$mydate > '2013-01-05', "red", "blue")
ggplot(mydf, aes(x = mydate, y = y, colour = cond)) +
geom_line() +
scale_colour_identity(mydf$cond) +
theme()
If you set group=1, then 1 will be used as the group value for all data points, and the line will join up.
ggplot(mydf, aes(x = mydate, y = y, colour = cond, group=1)) +
geom_line() +
scale_colour_identity(mydf$cond) +
theme()

Resources