I was wondering how to show the progress of a ggplot2 or ggmap operation while it is calculating. I'm working with some very large shapefiles and they can take several minutes to display the plot. I know there are several packages and functions that allow you to insert a progress bar or percentage during a calculation (eg. the "progress" package), but I can't figure out how to insert it inside the ggplot operation.
So, example below:
library (ggplot2)
library (sf)
ggplot() +
geom_sf(data = "myshapefile.shp", size = 1, color = "black", fill = "red") +
ggtitle("Testplot") +
coord_sf()
This shapefile takes several minutes to render into a plot. Rather than just sitting there waiting for it to show up and not knowing when it will finish or if it is stuck. I would like to see some kind of progress (bar or percentage, doesn't matter - eg. showing when it is 10%, 20%, 30%, etc. done).
Hadley Wickham, the creator of ggplot2, says a progress bar is not possible here.
His suggestion to simplify the shapefile will remove vertices to a certain tolerance, and will usually make it render faster without any perceivable difference in the figure. It can be done like so.
library (ggplot2)
library (sf)
my_shapefile <- read_sf("myshapefile.shp")
# You'll have to test different tolerance levels here. Higher
# values will produce more rigid looking shapes which will render faster.
# I would try values of 0.01, 0.1, 1, and 10.
my_shapefile <- st_simplify(my_shapefile, dTolerance = 0.1)
ggplot() +
geom_sf(data = my_shapefile, size = 1, color = "black", fill = "red") +
ggtitle("Testplot") +
coord_sf()
Related
I'm trying to plot predicted draws from a brms model using ggdist, specifically stat_slab, and having issues with coord_cartesian to zoom in. Coord_cartesian succeeds in cropping the x-axis on the lower end, i.e. ggdist object is displayed correctly if adjusting xlim low value from 0 to 50. However, when limiting xlim at the upper end (e.g. 2000 to 1000) slab fill disappears from those stat_slabs whose tails go beyond the upper xlim value. It seems that coord_cartesian fails at the upper end at the distribution. Here's a code that should show what happens:
theme_set(theme_tidybayes() + panel_border())
set.seed(1)
vars = data.frame(year = rep(c("before","after"), each = 1100),
treatment = rep(c("low","high","low","high"), each = 550))
dat = data.frame(vars, .value = rgamma(nrow(vars),
scale = 100, shape = 4))
ggplot(dat,
aes(x = .value, y = treatment, group= year, fill=year, color = year)) +
stat_slab(alpha=0.6, slab_color = "black") +
#coord_cartesian(xlim = c(0, 2000)) + # all values
#coord_cartesian(xlim = c(300, 2000)) + # success at lower end
#coord_cartesian(xlim = c(0, 1000)) + # fill disappears in one slab
coord_cartesian(xlim = c(0, 900)) -> reprex1 # fill disappears in 3 slabs
plot that I get when running the above code
In addition, in my actual plots I would like to use slab_color=NA, and use only fill and manual fill color scales - so it's really unfortunate that the fill disappears... I really haven't found a solution, so thank you if someone can help.
EDIT: The problem seems to be somehow associated with R graphics and plotting window size. The smaller the window or plotting area, the more fills appear in the slabs, I think. Two examples with different sizes:
reprex1
tiff(paste0(getwd(),"/reprex1.tiff"),
res=600, compression="lzw",
height = 9, width = 13, units = "in")
print(reprex1)
dev.off()
all fills disappear when plot size is large,
reprex1
tiff(paste0(getwd(),"/reprex1.tiff"),
res=600, compression="lzw",
height = 3, width = 5, units = "in")
print(reprex1)
dev.off()
while in the smaller sized plot already two slabs are filled. What could cause this?
I'm using R version 4.1.0 (2021-05-18), ggdist 2.4.1 and ggplot2 3.3.5. All packages are up to date. I have the problem both in base R and RStudio.
BR,
Maria
[Short rationale for why I want to zoom in: My data has a lot of variation, and the tails of the slabs are massive, making it difficult to see the differences in year*treatment means. Therefore, I want to zoom in the plot a bit to focus on the high-density area]
Ah okay, I have no idea why, but I think this is related to alpha values on the Windows graphics device (which it looks like you are using).
If I run your code on the Cairo graphics device, I get the expected output:
But if I run it on the Windows graphics device, I get this:
If I remove alpha=0.6 from the call to stat_slab(), the densities come back:
I'm not sure what's going on, but it appears to be some interaction between alpha fills and clipping on the Windows graphics device in R. I'm not sure there's a fix at the ggdist level.
My main suggestion would be to switch to using the Cairo graphics device, which produces nicer-looking plots anyway (the antialiasing on plots generated on the Windows graphics device is quite ugly --- notice the jagged lines in its plots and the smooth lines in the Cairo plots).
If you are using RStudio 1.4+ you can change to the Cairo device by default in the Settings panel:
And if you are saving files as PNGs you can pass type = "cairo" to the png() function.
I am trying to use ggbio to plot gene transcripts. I want to plot a very specific range so it matches my ggplot2 plots. The problem is my example plot ends up having range of 133,567,500-133,570,000 regardless of the GRange and whether I specify xlim or not.
This example should only plot a small bit of intron (the thin arrowed line) but instead plots the full 2 exons and intron in between. I believe autoplot wants to plot the entire transcript or transcripts present in the range and widens the range to accommodate for that.
library(EnsDb.Hsapiens.v86)
library(ggbio)
ensdb <- EnsDb.Hsapiens.v86
mut<-GRanges("10", IRanges(133568909, 133569095))
gene <- autoplot(ensdb, which=mut, names.expr="gene_name",xlim=c(133568909,133569095))
gene.gg <- gene#ggplot
png("test_gene_plot_5.png")
gene.gg
dev.off()
Is there any way to over-ride this? I've looked at the manual page for autoplot and I couldn't narrow down an option that would fix it. Others have said to use xlim, but that does not seem to change anything
I like ggbio because it can make a ggplot2 object to be plotted along with other ggplot2 objects. I have not seen an example for that with other approaches like Gvis. But I would entertain other approaches if they could be combined with my existing plots.
Thanks!
Amy
It kind of depends wether you want clipped or squished data. Usually autoplot outputs a ggplot object at some point that can be manipulated as such.
For squished data:
library(GenomicRanges) # just to be sure start and end work
gene#ggplot +
scale_x_continuous(limits = c(start(mut), end(mut)), oob = scales::squish)
For clipped data:
gene#ggplot +
coord_cartesian(xlim = c(start(mut), end(mut)))
But to be totally honest, I'm unsure wether this is the most informative way to communicate that you are plotting the internals of an intron.
Alternatively, I've written a gene model geom at some point that doesn't work through the autoplot methods (which can sometimes be a pain if you want to customise everything). Downside is that you'd have to do some manual gene searching and setting aesthetics. Upside is that it works like most other geoms and is therefore easy to combine with some other data.
library(ggnomics) # from: https://github.com/teunbrand/ggnomics
# Finding a gene's exons manually
my_gene <- transcriptsByOverlaps(EnsDb.Hsapiens.v86, mut)
my_gene <- exonsByOverlaps(EnsDb.Hsapiens.v86, my_gene)
my_gene <- as.data.frame(my_gene)
some_other_data <- data.frame(
x = seq(start(mut), end(mut), by = 10),
y = cumsum(rnorm(19))
)
ggplot(some_other_data) +
geom_line(aes(x, y)) +
geom_genemodel(data = my_gene,
aes(xmin = start, xmax = end,
y = max(some_other_data$y) + 1,
group = 1, strand = strand)) +
coord_cartesian(xlim = c(start(mut), end(mut)))
Hope that helped!
I try to generate a plot on which every point stands for an event. Color, Size and faced_grid are used to give additional information available in a visual way. The graph is working in ggplot2 but it is often important to know the exact numbers so an interactive version is needed which enables to hover over the point and get the info. I tried to convert the plot into an interactive version with the function ggplotly from the plotly-package. The problem then is, that the legend not only display the different states of the used attributes, it contains every existent combination. In addition, it did not display info from geom_rect.
I found related/similar questions but they used the function plot_ly and not ggploty or did not provide an answer.
Following, the same problem illustrated with the mtcars dataset:
library(plotly)
g = ggplot(mtcars,aes(x=mpg,y=disp,color = as.factor(cyl),size =as.factor(gear))) +
geom_point() +
geom_text(label = c(rep("A",nrow(mtcars)-5),rep("B",5)),color = "black",size=4) +
geom_rect(data=data.frame(name="zone",Start=20,End = 30,ymin = -Inf,ymax = Inf),aes(xmin=Start, xmax=End, ymin=ymin, ymax=ymax,fill=name),inherit.aes = FALSE,alpha=0.3)+
facet_grid(vs~am)
g
This is the result and how it should look like: ggplot Graph
Now using ggplotly
ggplotly(g)
This is the result: ggploty Graph
(1) The legend is now a combination of the different attributes used for Color and Size
(2) geom_rect is in the legend but didn’t get displayed in the graph
Does anyone knows how to get the same graph in ggplotly like in ggplot2? I am grateful for every hint. Thanks
Dave
I do not know how to fix the combination of legends when you use ggplotly. But, I can fix the second problem, if you do not use the Inf and -Inf, the geom_rect will work:
ggplotly(ggplot(mtcars,aes(x=mpg,y=disp, = as.factor(cyl),size =as.factor(gear))) +
geom_rect(aes( xmin=20,
xmax=30,
ymin=0,
ymax=max(mtcars$disp),
fill="Name"),
inherit.aes = FALSE, alpha=0.3) +
geom_point() +
geom_text(label = c(rep("A",nrow(mtcars)-5),rep("B",5)), = "black",size=4) +
facet_grid(vs~am))
However, the legends are bad.
I would suggest using subplot to create the same thing in Plotly, and I think this link Ben mentioned will help you create each subplot. One thing to mention is that I had trouble Illustrating different size in legend in plotly, while the size of the marker will be different, there will not be a legend for the size scale. Maybe a scale will be a better option.
I've got some issue creating a map with ggplot2 above which I project points using geom_point. When exporting in pdf or in an other support, the point size varies (because she's absolute and not axis-relative). I've searched how to change that and found a lot of answers saying, that it was on purpose, because if it wasn't the case it would be changing to ellipse each time the axis proprtions change. I understand that, however, because I work on a map, I use coord_fixed to fix the output and avoid distorsions of my map, so if I was able to fix the point size relatively to the plot size, it wouldn't be a problem.
Is there some solution to do that? I've read some interesting things suggesting using geom_polygon to artificially create ellipses. But I have two problems with this method:
First I don't know how to implement that with my data, now I know the place where the centers of my points are, but how could I then later say how to define all the centers and then defin a filled circled polygon around?
Second I have used scale_size_continuous to plot smaller or bigger points relatively to other variable. How could I implement that with geom_polygon?
Facit: I would be happy either with the possibility of override the impossibility to determine a relative unit for the point size, or with some help to make me understand how I can create the same thing with the function geom_polygon.
I tried to join a small reproducible example here. It is only an example, the problem with my data is that I have a lot of closed small values (mainly 1, like the small dot in the reproducible example), and so they seem really good, but when exporting it can become very bigger and create a lot of problems by overplotting, which is the reason why I need to fix this ratio.
Link for the map informations and second link for map informations
dat <- data.frame(postcode=c(3012, 2000, 1669, 4054, 6558), n=c(1, 20, 40, 60, 80))
ch <- read.csv("location/PLZO_CSV_LV03/PLZO_CSV_LV03.csv", sep=";")#first link, to attribute a geographical location for each postcode
ch <- ch%>%
distinct(PLZ, .keep_all=TRUE)%>%
group_by(PLZ, N, E)%>%
summarise
ch <- ch%>%
filter(PLZ %in% dat$postcode)
ch <- ch%>%
arrange(desc(as.numeric(PLZ)))
dat <- dat%>%
arrange(desc(as.numeric(postcode)))
datmap <- bind_cols(dat, ch)
ch2 <- readOGR("location/PLZO_SHP_LV03/PLZO_PLZ.shp")#second link, to make the shape of the country
ch2 <- fortify(ch2)
a <- ggplot()+
geom_polygon(dat=ch2, aes(x=long, y=lat, group=group), colour="grey75", fill="grey75")+
geom_jitter(data=datmap, aes(x=E, y=N, group=FALSE, size=n), color=c("red"))+ #here I put geom_jitter, but geom_point is fine too
scale_size_continuous(range=c(0.7, 5))+
coord_fixed()
print(a)
Thanks in advance for the help!
You can use ggsave() to save the last plot and adjust the scaling factor used for points/lines etc. Try this:
ggplot(data = ch2) +
geom_polygon(aes(x=long, y=lat, group=group),
colour="grey85", fill="grey90") +
geom_point(data=datmap, aes(x=E, y=N, group=FALSE, size=n),
color=c("red"), alpha = 0.5) +
scale_size_continuous(range=c(0.7, 5)) +
coord_fixed() +
theme_void()
ggsave(filename = 'plot.pdf', scale = 2, width = 3, height = 3)
Play around with the scale parameter (and optionally the width and height) until you are happy with the result.
DO NOT use geom_jitter(): this will add random XY variation to your points. To deal with overplotting you can try adding transparency - I added an alpha parameter for this. I also used theme_void() to get rid of axes and background.
Your shape file with map information is quite heavy: you can try a simple one with Swiss cantons, like this one.
I've got a polar plot which uses geom_smooth(). The smoothed loess line though is very small and rings around the center of the plot. I'd like to "zoom in" so you can see it better.
Using something like scale_y_continuous(limits = c(-.05,.7)) will make the geom_smooth ring bigger, but it will also alter it because it will recompute with the datapoints limited by the limits = c(-.05,.7) argument.
For a Cartesian plot I could use something like coord_cartesian(ylim = c(-.05,.7)) which would clip the chart but not the underlying data. However I can see no way to do this with coord_polar()
Any ideas? I thought there might be a way to do this with grid.clip() in the grid package but I am not having much luck.
Any ideas?
What my plot looks like now, note "higher" red line:
What I'd like to draw:
What I get when I use scale_y_continuous() note "higher" blue line, also it's still not that big.
I haven't figured out a way to do this directly in coord_polar, but this can be achieved by modifying the ggplot_build object under the hood.
First, here's an attempt to make a plot like yours, using the fake data provided at the bottom of this answer.
library(ggplot2)
plot <- ggplot(data, aes(theta, values, color = series, group = series)) +
geom_smooth() +
scale_x_continuous(breaks = 30*-6:6, limits = c(-180,180)) +
coord_polar(start = pi, clip = "on") # use "off" to extend plot beyond axes
plot
Here, my Y (or r for radius) axis ranges from about -2.4 to 4.3.
We can confirm this by looking at the associated ggplot_build object:
# Create ggplot_build object and look at radius range
plot_build <- ggplot_build(plot)
plot_build[["layout"]][["panel_params"]][[1]][["r.range"]]
# [1] -2.385000 4.337039
If we redefine the range of r and plot that, we get what you're looking for, a close-up of the plot.
# Here we change the 2nd element (max) of r.range from 4.337 to 1
plot_build[["layout"]][["panel_params"]][[1]][["r.range"]][2] <- 1
plot2 <- ggplot_gtable(plot_build)
plot(plot2)
Note, this may not be a perfect solution, since this seems to introduce some image cropping issues that I don't know how to address. I haven't tested to see if those can be overcome using ggsave or perhaps by further modifying the ggplot_build object.
Sample data used above:
set.seed(4.2)
data <- data.frame(
series = as.factor(rep(c(1:2), each = 10)),
theta = rep(seq(from = -170, to = 170, length.out = 10), times = 2),
values = rnorm(20, mean = 0, sd = 1)
)