Mapping variable to hexagon size with geom_hex - r

Does anyone know if its possible to map to hexagon size with ggplot? Size is listed as an argument in the geom_hex documentation, but there are no examples of size mapping in stat_hexbin, so this just seems to relate to bin size.
Take for example:
ggplot(economics, aes(x=uempmed, y=unemploy)) + geom_hex()
But looking for instance at population distribution (below) it might be useful to map binned mean population to hexagon size, but I've not found a formula for this (if one exists).
ggplot(economics, aes(x=uempmed, y=unemploy, col=pop)) + geom_point()
Any ideas?

Apparently the official answer is that ggplot does not have functionality to map to hexagon area. But as you can see a workaround solution is possible, now posted in a gist at github.

Related

Remembering steps in R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last year.
Improve this question
I am a beginner in R and it might appear irrelevant. But can anyone tell me how to remember syntax? like arguments of ggplot or tidyverse or any other package.
There are a few ways to do that. You can start writing the function and press TAB, it will appear in a pop up. You can also check the cheatsheet, here are some
examples: https://www.rstudio.com/resources/cheatsheets/
Or you can check the help topic by writing the function with a ? in it's start, for example: ?ggplot
OP, your question does not relate to coding per se - no problem to solve via issues with code - so it's not really supposed to be on SO. With that being said, it is a viable question and very daunting to approach using ggplot2 to create plots when you really don't have the background for doing so. Consequently, I think you still deserve a good answer, so here are some principles to help out a new user.
Know where to get information
The biggest help to offer is to practice. You will become more familiar with usage, but even "the pros" forget the argument syntax and what stuff does. In this case, the following is helpful:
Use RStudio. The base R terminal is fully capable; however, RStudio brings a ton of conveniences that make programming in R so much easier. Tooltips are an important part of how I create and use functions in R. If you start typing out a function, you'll be presented with a short list of arguments:
What's more, you can start typing an argument and you'll get a description from the help directly within RStudio:
Check the help for functions. This one should be obvious, but I am constantly checking the help for functions on CRAN. This is easily done in RStudio by typing ? before the function. So, if I need to know the arguments and syntax for geom_point(), I'll type ?geom_point into the console and you'll get the documentation directly within RStudio.
Online Resources. A quick search online can give you a lot of information (maybe even this answer). There are a lot of other resources: here too. Including here, here, here, and here.
Become familiar with the Principles of plotting in ggplot2
Knowing where to get information is helpful, but sometimes you feel so lost that you don't even know what information you actually are looking to get. This is the crux of many of the questions here on SO related to ggplot2, which is: "how can I change my axes?", "How do I change colors in the plot?", or "How can I get a legend to show x, y, or z?". Sometimes you can google, but often it's not even clear what you are looking to find.
This is where a fundamental understanding of how to create a plot in ggplot2 becomes useful. I'll go through how I always approach plotting in ggplot2 and hopefully this will help you out a bit.
Step 1 - prepare data
Making your data prepared to plot is exceptionally useful, and sometimes difficult to do. It's a bit beyond what I intend to communicate here, but a mandatory piece of reading would be regarding Tidy Data Principles.
Step 2 - Think about Mapping
Mapping is often overlooked in the process, but in short, this is how the columns of your dataset relate to the plot. It's easy to say "this column will be my x axis" and "this column will be my y axis", but you should also be clear on if the values of other columns will relate to color, fill, size, shape, etc etc... Thinking this way, it will soon be quite obvious why you would want to get Step 1 correct above, because only Tidy data will be able to be used directly in mapping without issue.
Step 3 - The Fundamental ggplot() call
The first step in plotting will be your first call to ggplot(). Here you need to assign data - example via df %>% ggplot(...) or ggplot(data=df, ...). This is also typically where you would setup at least your x and y axes via mapping. You can just stop here (x and y axes), or you can specify the other aesthetics in the mapping here too. Ultimately, this alone plotted "sets up" the plot. If we just plot the result of that, you get the following:
p <- ggplot(mtcars, aes(disp, mpg))
p
Step 4 - Add your geoms
A "geom" (short for "geometry") describes the shapes and "things" on your plot that will be positioned on the x and y axes. You can add any number, but in this example, we'll add points. If all you want to do is plot the observations at the x and y axes, you just need to add geom_point() and that should be enough:
p + geom_point()
Step 5 - Adding Legends
Note we don't have a legend yet. This is because there are no aesthetics mapped other than x and y. ggplot2 creates legends automatically when you specify in the mapping (via aes()) a characteristic way of differentiating the way we draw a geom. In other words, we can describe color= within aes() and that will initiate the creation of a legend. You can do the same with other aesthetics too.
p + geom_point(aes(color=cyl))
This creates a legend type depending on the type of data mapped. So, a colorbar legend is created here because the column mtcars$cyl is numeric. If we use a non-numeric column, you get a discrete legend:
p + geom_point(aes(color=rownames(mtcars)))
There's advanced stuff too... but not covered here.
Step 6 - Adjusting the Scales
All we do when you specify mapping (i.e. aes(color = ...),) is how the data is mapped to that aesthetic. This does not specify the actual color to be used. If you don't specify, the default coloring or sizing is used, but sometimes you want to change that. You can do that via scale_*_() functions... of which there are many depending on your application. For information on color scales, you can see this answer here... but suffice it to say this is quite a complicated part of the plotting stuff that depends greatly on what you want to do. Many of the scale_() functions are structured similarly, so you can probably get an idea of what you can do with that answer and see. Here's an example of how we can adjust the color with one of these functions:
p + geom_point(aes(color=cyl)) +
scale_color_gradient(low="red", high="green")
Step 7 - Adjusting Labels
Here I usually add the plot labels and axis labels. You can conviently use ylab(), or xlab() or ggtitle() to assign axis labels and the title, or just define them all together with labs(y = ..., x = ..., title = ...). You can also use this time to format and arrange things associated with legends and scales (tick marks and whatnot) via guides(...) (for legends) or the scale_x_*() and scale_y_*() functions (for tick marks on axes).
Step 8 - Theme Elements
Finally, you can change the overall look with various ggplot themes. An account of default themes is given here, but you can extend that with the ggtheme package to get more. You might want to just change a few specific elements of size, color, linetype, etc on the plot. You can address these specific elements via theme(). A helpful list of theme elements is given here.
So, putting it all together you have:
# initial call
ggplot(mtcars, aes(disp, mpg)) +
# geoms
geom_point(aes(color=cyl), size=3) +
# define the color scale
scale_color_viridis_c() +
# define labels and ticks and stuff
# axis
scale_x_continuous(breaks = seq(0, 600, by=50)) +
# legend ticks
guides(color=guide_colorbar(ticks.colour = "black", ticks.linewidth = 3)) +
# Labels
labs(x="Disp", y="Miles per gallon (mpg)", color = "# of \ncylinders", title="Ugly Plot 1.0") +
# theme and theme elements
theme_bw() +
theme(
panel.background = element_rect(fill="gray90"),
panel.grid.major = element_line(color="gray20", linetype=2, size=0.2),
panel.grid.minor = element_line(color="gray70", linetype=2, size=0.1),
axis.text = element_text(size=12, face = "bold"),
axis.text.x = element_text(angle=30, hjust=1)
)
It's a lot of steps, but I break it down like that basically every time. When plot code gets large, I break up the chunks much in that manner above to help clear my mind on how to create the plot.

Turn pixel image into scalable vector graphic in ggplot

Please note that im not interested in any kind of interpolation algorithms where you expand the amount of pixels and interpolate the new values.
I want to leave the world of pixel based images and am looking for some scalable vector image solution.
Is there a way to turn a pixel image in a ggplot into a color meshed smooth vector graphic?
The following pictures demonstrate what im aiming for.
and then smooth it out.
The images are taken from the following wikipedia article HERE
Please note that the original images from the article are SVG files. You can zoom in as much as you want and you always have smooth color transitions and no edges.
Some additional images and infos: HERE2
Here is some example Data of something that meets the first image "nearest"
library(ggplot2)
n = 5
pixelImg <- expand.grid(x=1:n,y=1:n)
pixelImg$value <- sample(1:n^2,n^2,replace = T)
ggplot(pixelImg, aes(x, y)) +
geom_raster(aes(fill = value)) +
scale_fill_gradientn(colours=c("#FFCd94", "#FF69B4", "#FF0000","#4C0000","#000000"))
If not in ggplot is there a way to do it outside of ggplot?
Look into the ggsave() function. It supports .svg files for vector graphics.

ggplot legend list is larger than page

I have a plot in R which has a very large number of sample groups, and therefore the legend is larger than the page size and is cut off. I understand that this is not publication quality, but I need to know the colours to be able to make the legend in Illustrator.
Is there a way to make the page size much bigger or somehow change the legend format so that I can include all the keys? The reason for this is so that I can open the PDF in Illustrator and get the colours for each sample to create a new legend that will be for publication. I thought that maybe there is a clipping mask, and that the actual legend will be preserved, but when I opened in Illustrator, the legend was actually cut at the page ends1.
As was suggested in the comments below I gave nrow a try which helped break the legends up but now the entire page is just legends.
ggplot(purine.n, aes(x=variable, y=value, colour=metabolite_gene, shape=variable))
+geom_abline(slope=0)
+geom_point(size=4, position=position_dodge(width=0.08))
+scale_y_continuous(limit=c(-3.5,5.5), breaks=c(-3,-2,-1,0,1,2,3,4,5))
+scale_shape_manual(values=c(16,17,17), guide=F)
+theme_bw()
+theme(legend.key=element_blank(), legend.key.size=unit(1,"point"))
+guides(colour=guide_legend(nrow=16))
As was suggested in the comments, nrow was the answer to my problem. I had to adjust the value to get the right number of rows to fit my legend. Below is the completed code that worked. There's more tweaking I need to do, like change page size to help make things look better, but that is out of the scope of this question.
ggplot(data.n, aes(x=variable, y=value, colour=metabolite_gene, shape=variable))
+geom_abline(slope=0)+geom_point(size=4, position=position_dodge(width=0.08))
+scale_y_continuous(limit=c(-3.5,5.5), breaks=c(-3,-2,-1,0,1,2,3,4,5))
+scale_shape_manual(values=c(16,17,17), guide=F)
+theme_bw()
+theme(legend.key=element_blank(), legend.key.size=unit(1,"point"))
+guides(colour=guide_legend(nrow=30))

keep colour palette constant between plots

I need to compare two maps of the same quantities, I would like to keep the colour palette constant in the two graphs, for easing comprehension, but it looks like I don't get how to do so.
Should I set limits (e.g. the minimum between all the plots assigned to low and the highest level to high?)
Is there an easy way to do so?
I am new to this, so sorry if the solution is banal, I went through a lot of blog posts but looks like I am not finding anything.
My code:
fin<-get_map("Helsinki",zoom=12)
ggmap(fin, legend="bottom")+
geom_polygon(data=a,aes(x=a$long,y=a$lat, id=id, fill=Test_statistics), alpha=0.1, colour="white")
To give you an idea, this is an image
and this is another
it is not clear at all!
Images still need a bit of "prettyfying" it is just to give an idea
Basically what I would like is in this question, but for discrete (factor) values
I can't reproduce your plots because you've not given us the data, but setting limits in a scale_colour_gradient should work. See:
http://docs.ggplot2.org/0.9.3.1/scale_gradient.html
under "Tweak scale limits" (second example) where Hadley says:
Setting the limits manually is also useful when producing
multiple plots that need to be comparable
For example (and I'm using points here for simplicity - you probably have to use scale_fill_gradient to set the fill colour for polygons - I don't have the time to build some polygons):
> set.seed(310366); d=data.frame(x=runif(20),y=runif(20),
z1=rnorm(20), z2=rnorm(20)+5)
note that z1 has a range of about -1 to 1, and z2 has a range of 4 to 7. This helps us see the effect.
> ggplot(d,aes(x=x,y=y,col=z1))+geom_point(size=8) +
scale_colour_gradient(limit=range(c(d$z1,d$z2))
> ggplot(d,aes(x=x,y=y,col=z2))+geom_point(size=8) +
scale_colour_gradient(limit=range(c(d$z1,d$z2)))
produces two plots with the same limits on the palette legend, but the first one has very dark points because the values are all low (-1 to 1) and the second one has mostly light colours because the values are all high (4 to 7).
Both sets of points have been coloured using the same mapping of value to colour because of the limit argument in the scale_colour_gradient function. You are mapping the fill attribute so I think you need scale_fill_gradient.
I didnt get your problem exctly, but try adding this to all your plots. Then the colour code should be uniform.
+scale_colour_brewer(pallette="Set1")
You can add any of the pallette's shown here with examples
http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/#color-charts

Indicating the statistically significant difference in bar graph USING R

This is a repeat of a question originally asked here: Indicating the statistically significant difference in bar graph but asked for R instead of python.
My question is very simple. I want to produce barplots in R, using ggplot2 if possible, with an indication of significant difference between the different bars, e.g. produce something like this. I have had a search around but can't find another question asking exactly the same thing.
I know that this is an old question and the answer by Didzis Elferts already provides one solution for the problem. But I recently created a ggplot-extension that simplifies the whole process of adding significance bars: ggsignif
Instead of tediously adding the geom_path and annotate to your plot you just add a single layer geom_signif:
library(ggplot2)
library(ggsignif)
ggplot(iris, aes(x=Species, y=Sepal.Length)) +
geom_boxplot() +
geom_signif(comparisons = list(c("versicolor", "virginica")),
map_signif_level=TRUE)
Full documentation of the package is available at CRAN.
You can use geom_path() and annotate() to get similar result. For this example you have to determine suitable position yourself. In geom_path() four numbers are provided to get those small ticks for connecting lines.
df<-data.frame(group=c("A","B","C","D"),numb=c(12,24,36,48))
g<-ggplot(df,aes(group,numb))+geom_bar(stat="identity")
g+geom_path(x=c(1,1,2,2),y=c(25,26,26,25))+
geom_path(x=c(2,2,3,3),y=c(37,38,38,37))+
geom_path(x=c(3,3,4,4),y=c(49,50,50,49))+
annotate("text",x=1.5,y=27,label="p=0.012")+
annotate("text",x=2.5,y=39,label="p<0.0001")+
annotate("text",x=3.5,y=51,label="p<0.0001")
I used the suggested method from above, but I found the annotate function easier for making lines than the geom_path function. Just use "segment" instead of "text". You have to break things up by segment and define starting and ending x and y values for each line segment.
example for making 3 lines segments:
annotate("segment", x=c(1,1,2),xend=c(1,2,2), y= c(125,130,130), yend=c(130,130,125))

Resources