ggplot geom_point varying size with windows size - r

I've got some issue creating a map with ggplot2 above which I project points using geom_point. When exporting in pdf or in an other support, the point size varies (because she's absolute and not axis-relative). I've searched how to change that and found a lot of answers saying, that it was on purpose, because if it wasn't the case it would be changing to ellipse each time the axis proprtions change. I understand that, however, because I work on a map, I use coord_fixed to fix the output and avoid distorsions of my map, so if I was able to fix the point size relatively to the plot size, it wouldn't be a problem.
Is there some solution to do that? I've read some interesting things suggesting using geom_polygon to artificially create ellipses. But I have two problems with this method:
First I don't know how to implement that with my data, now I know the place where the centers of my points are, but how could I then later say how to define all the centers and then defin a filled circled polygon around?
Second I have used scale_size_continuous to plot smaller or bigger points relatively to other variable. How could I implement that with geom_polygon?
Facit: I would be happy either with the possibility of override the impossibility to determine a relative unit for the point size, or with some help to make me understand how I can create the same thing with the function geom_polygon.
I tried to join a small reproducible example here. It is only an example, the problem with my data is that I have a lot of closed small values (mainly 1, like the small dot in the reproducible example), and so they seem really good, but when exporting it can become very bigger and create a lot of problems by overplotting, which is the reason why I need to fix this ratio.
Link for the map informations and second link for map informations
dat <- data.frame(postcode=c(3012, 2000, 1669, 4054, 6558), n=c(1, 20, 40, 60, 80))
ch <- read.csv("location/PLZO_CSV_LV03/PLZO_CSV_LV03.csv", sep=";")#first link, to attribute a geographical location for each postcode
ch <- ch%>%
distinct(PLZ, .keep_all=TRUE)%>%
group_by(PLZ, N, E)%>%
summarise
ch <- ch%>%
filter(PLZ %in% dat$postcode)
ch <- ch%>%
arrange(desc(as.numeric(PLZ)))
dat <- dat%>%
arrange(desc(as.numeric(postcode)))
datmap <- bind_cols(dat, ch)
ch2 <- readOGR("location/PLZO_SHP_LV03/PLZO_PLZ.shp")#second link, to make the shape of the country
ch2 <- fortify(ch2)
a <- ggplot()+
geom_polygon(dat=ch2, aes(x=long, y=lat, group=group), colour="grey75", fill="grey75")+
geom_jitter(data=datmap, aes(x=E, y=N, group=FALSE, size=n), color=c("red"))+ #here I put geom_jitter, but geom_point is fine too
scale_size_continuous(range=c(0.7, 5))+
coord_fixed()
print(a)
Thanks in advance for the help!

You can use ggsave() to save the last plot and adjust the scaling factor used for points/lines etc. Try this:
ggplot(data = ch2) +
geom_polygon(aes(x=long, y=lat, group=group),
colour="grey85", fill="grey90") +
geom_point(data=datmap, aes(x=E, y=N, group=FALSE, size=n),
color=c("red"), alpha = 0.5) +
scale_size_continuous(range=c(0.7, 5)) +
coord_fixed() +
theme_void()
ggsave(filename = 'plot.pdf', scale = 2, width = 3, height = 3)
Play around with the scale parameter (and optionally the width and height) until you are happy with the result.
DO NOT use geom_jitter(): this will add random XY variation to your points. To deal with overplotting you can try adding transparency - I added an alpha parameter for this. I also used theme_void() to get rid of axes and background.
Your shape file with map information is quite heavy: you can try a simple one with Swiss cantons, like this one.

Related

how to change the color scale for each graph with facet_wrap and legend

I have a question about facet_wrap() in ggplot2.
I am trying to make a graph that looks like this. I attach an example image 1.enter image description here
In image 1 it can be seen that there are two maps and each one has its legend and color scale. I would like to be able to do this with ggplot and the facet_wrap() function.
My problem is that because the data in the dataframe is very different, they have a lot of amplitude for each map, when plotting the scale it does not allow me to visualize it the way I want.
enter image description here
ggplot(dataframe,mapping=aes(x=lon,x=lat))+
geom_contour_fill((aes(z=hgt,fill=stat(level)))+
geom_contour(aes(z=hgt),color="black",size=0.2)+
scale_fill_distiller(palette = "YlOrBr",direction = 1,super=ScaleDiscretised)+
mi_mapa+
coord_quickmap(xlim = range(dataframe$lon),ylim=range(dataframe$lat),expand = FALSE)+
facet_wrap(~nombre_nivel,scales="free", ncol =2) +
labs(x="Longitud",y="Latitud",fill="altura",title = "campos")
my dataframe has a shape like this. Where the facets are determined by the level variable. In this case the dataframe has another variable which is temp instead of hgt, but it's just another name.
enter image description here
Thanks
I think I've faced the alike problem building the two parts of the single map with two different scales. I found the package grid useful.
library(grid)
grid.newpage()
print(myggplot, vp = specifiedviewport)
In my case I built the first p <- ggplot() than adjusted p2 <- p + ...
with printing the two ggplots (p and p2) in two viewports. You can newly construct p2 with individual scale and print it in the grid. You can find useful information
here.

Is it possible to over-ride the x axis range in R package ggbio when using autoplot and ensdb transcripts?

I am trying to use ggbio to plot gene transcripts. I want to plot a very specific range so it matches my ggplot2 plots. The problem is my example plot ends up having range of 133,567,500-133,570,000 regardless of the GRange and whether I specify xlim or not.
This example should only plot a small bit of intron (the thin arrowed line) but instead plots the full 2 exons and intron in between. I believe autoplot wants to plot the entire transcript or transcripts present in the range and widens the range to accommodate for that.
library(EnsDb.Hsapiens.v86)
library(ggbio)
ensdb <- EnsDb.Hsapiens.v86
mut<-GRanges("10", IRanges(133568909, 133569095))
gene <- autoplot(ensdb, which=mut, names.expr="gene_name",xlim=c(133568909,133569095))
gene.gg <- gene#ggplot
png("test_gene_plot_5.png")
gene.gg
dev.off()
Is there any way to over-ride this? I've looked at the manual page for autoplot and I couldn't narrow down an option that would fix it. Others have said to use xlim, but that does not seem to change anything
I like ggbio because it can make a ggplot2 object to be plotted along with other ggplot2 objects. I have not seen an example for that with other approaches like Gvis. But I would entertain other approaches if they could be combined with my existing plots.
Thanks!
Amy
It kind of depends wether you want clipped or squished data. Usually autoplot outputs a ggplot object at some point that can be manipulated as such.
For squished data:
library(GenomicRanges) # just to be sure start and end work
gene#ggplot +
scale_x_continuous(limits = c(start(mut), end(mut)), oob = scales::squish)
For clipped data:
gene#ggplot +
coord_cartesian(xlim = c(start(mut), end(mut)))
But to be totally honest, I'm unsure wether this is the most informative way to communicate that you are plotting the internals of an intron.
Alternatively, I've written a gene model geom at some point that doesn't work through the autoplot methods (which can sometimes be a pain if you want to customise everything). Downside is that you'd have to do some manual gene searching and setting aesthetics. Upside is that it works like most other geoms and is therefore easy to combine with some other data.
library(ggnomics) # from: https://github.com/teunbrand/ggnomics
# Finding a gene's exons manually
my_gene <- transcriptsByOverlaps(EnsDb.Hsapiens.v86, mut)
my_gene <- exonsByOverlaps(EnsDb.Hsapiens.v86, my_gene)
my_gene <- as.data.frame(my_gene)
some_other_data <- data.frame(
x = seq(start(mut), end(mut), by = 10),
y = cumsum(rnorm(19))
)
ggplot(some_other_data) +
geom_line(aes(x, y)) +
geom_genemodel(data = my_gene,
aes(xmin = start, xmax = end,
y = max(some_other_data$y) + 1,
group = 1, strand = strand)) +
coord_cartesian(xlim = c(start(mut), end(mut)))
Hope that helped!

How to set heigth of rows grid in graph lines on ggplots (R)?

I'm trying plots a graph lines using ggplot library in R, but I get a good plots but I need reduce the gradual space or height between rows grid lines because I get big separation between lines.
This is my R script:
library(ggplot2)
library(reshape2)
data <- read.csv('/Users/keepo/Desktop/G.Con/Int18/input-int18.csv')
chart_data <- melt(data, id='NRO')
names(chart_data) <- c('NRO', 'leyenda', 'DTF')
ggplot() +
geom_line(data = chart_data, aes(x = NRO, y = DTF, color = leyenda), size = 1)+
xlab("iteraciones") +
ylab("valores")
and this is my actual graphs:
..the first line is very distant from the second. How I can reduce heigth?
regards.
The lines are far apart because the values of the variable plotted on the y-axis are far apart. If you need them closer together, you fundamentally have 3 options:
change the scale (e.g. convert the plot to a log scale), although this can make it harder for people to interpret the numbers. This can also change the behavior of each line, not just change the space between the lines. I'm guessing this isn't what you will want, ultimately.
normalize the data. If the actual value of the variable on the y-axis isn't important, just standardize the data (separately for each value of leyenda).
As stated above, you can graph each line separately. The main drawback here is that you need 3 graphs where 1 might do.
Not recommended:
I know that some graphs will have the a "squiggle" to change scales or skip space. Generally, this is considered poor practice (and I doubt it's an option in ggplot2 because it masks the true separation between the data points. If you really do want a gap, I would look at this post: axis.break and ggplot2 or gap.plot? plot may be too complexe
In a nutshell, the answer here depends on what your numbers mean. What is the story you are trying to tell? Is the important feature of your plots the change between them (in which case, normalizing might be your best option), or the actual numbers themselves (in which case, the space is relevant).
you could use an axis transformation that maps your data to the screen in a non-linear fashion,
fun_trans <- function(x){
d <- data.frame(x=c(800, 2500, 3100), y=c(800,1950, 3100))
model1 <- lm(y~poly(x,2), data=d)
model2 <- lm(x~poly(y,2), data=d)
scales::trans_new("fun",
function(x) as.vector(predict(model1,data.frame(x=x))),
function(x) as.vector(predict(model2,data.frame(y=x))))
}
last_plot() + scale_y_continuous(trans = "fun")
enter image description here

How to change color/shape/size for a subset of data after plotting in ggplot2

There are lots of situations where I use ggplot to create a nice looking graph, but I would like to play around with the colors/shapes/sizes for data belonging to a certain group (e.g. to highlight it).
I understand how to set these properties differently for each group when I first create the plot. However, I would like to know if there is a simple command to change the properties after the plot has been created preferably without having to specify the properties for all other subsets).
As an example consider the following code:
library(ggplot2)
x = seq(0,1,0.2)
y = seq(0,1,0.2)
types = c("a","a","a","b","b","c")
df = data.frame(x,y,types)
table_of_colors = c("a"="red","b"="blue","c"="green")
table_of_shapes = c("a"=15,"b"=15,"c"=16)
my_plot = ggplot(df) +
theme_bw() +
geom_point(aes(x=x,y=y,color=types,shape=types),size=10) +
scale_color_manual(values = table_of_colors) +
scale_shape_manual(values=table_of_shapes)
which produces the following plot:
I'm wondering:
Is there a way to change the color of the green point (type=="c") without having to type out the colors for the other points?
Is there a way to change the shape of the blue/red points (type %in% c("a","b")) without having to type out the shapes for all the other points?
The size of all points is currently set to 10. Is there a way to change the size of only the green point to say 15, while keeping the size of all remaining points at 10?
I'm not sure if this is an existing feature, but hacks are welcome (so long as the changes will be reflected in the legend).
This seems kind of hacky to me, but the code below addresses items 1 and 2 in your list:
my_plot +
scale_colour_manual(values=c(table_of_colors[1:2],c="green")) +
scale_shape_manual(values=c(a=4,b=6, table_of_shapes[3]))
I thought maybe you could change the size with something like scale_size_manual(values=c(10,10,15)), but that doesn't work, perhaps because size was hard-coded, rather than set with an aesthetic to begin with.
It would probably be cleaner to just create new vectors of shapes, colors, etc., as needed, rather than to make individual ad hoc changes like those above.

r stat_contour incorrect fill with polygon

When I use stat_contour with polygon, some regions should not be filled because there is no data there, i marked them in the figure. Does anyone know how to avoid that? In addition, there is space between axis and plot region, how to remove it?!
Here is the plotting code:
plot_contour <- function (da, native ) {
h2d<-hist2d(da$germ_div,da[[native]],nbins=40,show=F)
h2d$counts<-h2d$counts+1
counts<-log(h2d$counts, base=10)
rownames(counts)<-h2d$x
colnames(counts)<-h2d$y
counts<-melt(counts)
names(counts)<-c('x','y','z')
ggplot(counts,aes(x,y))+
stat_contour(expand=c(0,0),aes(z=z,fill=..level..),geom='polygon')+
stat_contour( data=counts[counts$x<=75,],aes(z=z,fill=..level..),bins=50,geom='polygon')+
scale_fill_gradientn(expand=c(0,0),colours=rainbow(1000),
limits=c(log(2,base=10),4),na.value='white',guide=F)+
geom_contour(aes(z=z,colour=..level..),size=1.5)+
scale_color_gradientn(colours=rainbow(30),limits=c(log(2,base=10),4),na.value='white',
guide=F) + theme_bw()+
scale_x_continuous(expand=c(0,0),limits=c(0,50))+
scale_y_continuous(expand=c(0,0),limits=c(40,100))+
labs(x=NULL, y=NULL, title=NULL)+
theme(axis.text.x = element_text(family='Times', colour="black", size=20, angle=NULL,
hjust=NULL,vjust=NULL,face="plain"),
axis.text.y = element_text( family='Times', colour="black", size=20,angle=NULL,
hjust=NULL,vjust=NULL,face="plain")
)
}
da<-read.table('test.txt',header=T)
i<-'test'
plot_contour(da,i)
This didn't fit in a comment, so posting as an answer:
stat_contour doesn't handle polygons that aren't closed very well. Additionally, there is a precision issue that crops up when setting the bins manually whereby the actual contour calculation can get freaked out (this happens when the contour bins are the same as plot data but aren't recognized as the same due to precision issues).
The first issue you can resolve by expanding your grid by 1 all around in every direction, and then setting every value in in the matrix that is lower than the lowest you care about to some arbitrarily low value. This will force the contour calculation to close all the polygons that would otherwise be open at the edges of the plot. You can then set the limits with coord_cartesian(xlim=c(...)) to have your axes flush with the graph.
The second issue I don't know of a good way to solve without modifying the ggplot code. You may not be affected by this issue.
#BrodieG
Your answer is correct, but it's a bit difficult without some code.
Adding the following lines, with appropriate x,y values (these are a best guess), makes things clearer:
xlim(-10, 60)+
ylim(30, 120)+
coord_cartesian(xlim=c(0, 50),ylim=c(40, 100))

Resources