keep colour palette constant between plots - r

I need to compare two maps of the same quantities, I would like to keep the colour palette constant in the two graphs, for easing comprehension, but it looks like I don't get how to do so.
Should I set limits (e.g. the minimum between all the plots assigned to low and the highest level to high?)
Is there an easy way to do so?
I am new to this, so sorry if the solution is banal, I went through a lot of blog posts but looks like I am not finding anything.
My code:
fin<-get_map("Helsinki",zoom=12)
ggmap(fin, legend="bottom")+
geom_polygon(data=a,aes(x=a$long,y=a$lat, id=id, fill=Test_statistics), alpha=0.1, colour="white")
To give you an idea, this is an image
and this is another
it is not clear at all!
Images still need a bit of "prettyfying" it is just to give an idea
Basically what I would like is in this question, but for discrete (factor) values

I can't reproduce your plots because you've not given us the data, but setting limits in a scale_colour_gradient should work. See:
http://docs.ggplot2.org/0.9.3.1/scale_gradient.html
under "Tweak scale limits" (second example) where Hadley says:
Setting the limits manually is also useful when producing
multiple plots that need to be comparable
For example (and I'm using points here for simplicity - you probably have to use scale_fill_gradient to set the fill colour for polygons - I don't have the time to build some polygons):
> set.seed(310366); d=data.frame(x=runif(20),y=runif(20),
z1=rnorm(20), z2=rnorm(20)+5)
note that z1 has a range of about -1 to 1, and z2 has a range of 4 to 7. This helps us see the effect.
> ggplot(d,aes(x=x,y=y,col=z1))+geom_point(size=8) +
scale_colour_gradient(limit=range(c(d$z1,d$z2))
> ggplot(d,aes(x=x,y=y,col=z2))+geom_point(size=8) +
scale_colour_gradient(limit=range(c(d$z1,d$z2)))
produces two plots with the same limits on the palette legend, but the first one has very dark points because the values are all low (-1 to 1) and the second one has mostly light colours because the values are all high (4 to 7).
Both sets of points have been coloured using the same mapping of value to colour because of the limit argument in the scale_colour_gradient function. You are mapping the fill attribute so I think you need scale_fill_gradient.

I didnt get your problem exctly, but try adding this to all your plots. Then the colour code should be uniform.
+scale_colour_brewer(pallette="Set1")
You can add any of the pallette's shown here with examples
http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/#color-charts

Related

Remembering steps in R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last year.
Improve this question
I am a beginner in R and it might appear irrelevant. But can anyone tell me how to remember syntax? like arguments of ggplot or tidyverse or any other package.
There are a few ways to do that. You can start writing the function and press TAB, it will appear in a pop up. You can also check the cheatsheet, here are some
examples: https://www.rstudio.com/resources/cheatsheets/
Or you can check the help topic by writing the function with a ? in it's start, for example: ?ggplot
OP, your question does not relate to coding per se - no problem to solve via issues with code - so it's not really supposed to be on SO. With that being said, it is a viable question and very daunting to approach using ggplot2 to create plots when you really don't have the background for doing so. Consequently, I think you still deserve a good answer, so here are some principles to help out a new user.
Know where to get information
The biggest help to offer is to practice. You will become more familiar with usage, but even "the pros" forget the argument syntax and what stuff does. In this case, the following is helpful:
Use RStudio. The base R terminal is fully capable; however, RStudio brings a ton of conveniences that make programming in R so much easier. Tooltips are an important part of how I create and use functions in R. If you start typing out a function, you'll be presented with a short list of arguments:
What's more, you can start typing an argument and you'll get a description from the help directly within RStudio:
Check the help for functions. This one should be obvious, but I am constantly checking the help for functions on CRAN. This is easily done in RStudio by typing ? before the function. So, if I need to know the arguments and syntax for geom_point(), I'll type ?geom_point into the console and you'll get the documentation directly within RStudio.
Online Resources. A quick search online can give you a lot of information (maybe even this answer). There are a lot of other resources: here too. Including here, here, here, and here.
Become familiar with the Principles of plotting in ggplot2
Knowing where to get information is helpful, but sometimes you feel so lost that you don't even know what information you actually are looking to get. This is the crux of many of the questions here on SO related to ggplot2, which is: "how can I change my axes?", "How do I change colors in the plot?", or "How can I get a legend to show x, y, or z?". Sometimes you can google, but often it's not even clear what you are looking to find.
This is where a fundamental understanding of how to create a plot in ggplot2 becomes useful. I'll go through how I always approach plotting in ggplot2 and hopefully this will help you out a bit.
Step 1 - prepare data
Making your data prepared to plot is exceptionally useful, and sometimes difficult to do. It's a bit beyond what I intend to communicate here, but a mandatory piece of reading would be regarding Tidy Data Principles.
Step 2 - Think about Mapping
Mapping is often overlooked in the process, but in short, this is how the columns of your dataset relate to the plot. It's easy to say "this column will be my x axis" and "this column will be my y axis", but you should also be clear on if the values of other columns will relate to color, fill, size, shape, etc etc... Thinking this way, it will soon be quite obvious why you would want to get Step 1 correct above, because only Tidy data will be able to be used directly in mapping without issue.
Step 3 - The Fundamental ggplot() call
The first step in plotting will be your first call to ggplot(). Here you need to assign data - example via df %>% ggplot(...) or ggplot(data=df, ...). This is also typically where you would setup at least your x and y axes via mapping. You can just stop here (x and y axes), or you can specify the other aesthetics in the mapping here too. Ultimately, this alone plotted "sets up" the plot. If we just plot the result of that, you get the following:
p <- ggplot(mtcars, aes(disp, mpg))
p
Step 4 - Add your geoms
A "geom" (short for "geometry") describes the shapes and "things" on your plot that will be positioned on the x and y axes. You can add any number, but in this example, we'll add points. If all you want to do is plot the observations at the x and y axes, you just need to add geom_point() and that should be enough:
p + geom_point()
Step 5 - Adding Legends
Note we don't have a legend yet. This is because there are no aesthetics mapped other than x and y. ggplot2 creates legends automatically when you specify in the mapping (via aes()) a characteristic way of differentiating the way we draw a geom. In other words, we can describe color= within aes() and that will initiate the creation of a legend. You can do the same with other aesthetics too.
p + geom_point(aes(color=cyl))
This creates a legend type depending on the type of data mapped. So, a colorbar legend is created here because the column mtcars$cyl is numeric. If we use a non-numeric column, you get a discrete legend:
p + geom_point(aes(color=rownames(mtcars)))
There's advanced stuff too... but not covered here.
Step 6 - Adjusting the Scales
All we do when you specify mapping (i.e. aes(color = ...),) is how the data is mapped to that aesthetic. This does not specify the actual color to be used. If you don't specify, the default coloring or sizing is used, but sometimes you want to change that. You can do that via scale_*_() functions... of which there are many depending on your application. For information on color scales, you can see this answer here... but suffice it to say this is quite a complicated part of the plotting stuff that depends greatly on what you want to do. Many of the scale_() functions are structured similarly, so you can probably get an idea of what you can do with that answer and see. Here's an example of how we can adjust the color with one of these functions:
p + geom_point(aes(color=cyl)) +
scale_color_gradient(low="red", high="green")
Step 7 - Adjusting Labels
Here I usually add the plot labels and axis labels. You can conviently use ylab(), or xlab() or ggtitle() to assign axis labels and the title, or just define them all together with labs(y = ..., x = ..., title = ...). You can also use this time to format and arrange things associated with legends and scales (tick marks and whatnot) via guides(...) (for legends) or the scale_x_*() and scale_y_*() functions (for tick marks on axes).
Step 8 - Theme Elements
Finally, you can change the overall look with various ggplot themes. An account of default themes is given here, but you can extend that with the ggtheme package to get more. You might want to just change a few specific elements of size, color, linetype, etc on the plot. You can address these specific elements via theme(). A helpful list of theme elements is given here.
So, putting it all together you have:
# initial call
ggplot(mtcars, aes(disp, mpg)) +
# geoms
geom_point(aes(color=cyl), size=3) +
# define the color scale
scale_color_viridis_c() +
# define labels and ticks and stuff
# axis
scale_x_continuous(breaks = seq(0, 600, by=50)) +
# legend ticks
guides(color=guide_colorbar(ticks.colour = "black", ticks.linewidth = 3)) +
# Labels
labs(x="Disp", y="Miles per gallon (mpg)", color = "# of \ncylinders", title="Ugly Plot 1.0") +
# theme and theme elements
theme_bw() +
theme(
panel.background = element_rect(fill="gray90"),
panel.grid.major = element_line(color="gray20", linetype=2, size=0.2),
panel.grid.minor = element_line(color="gray70", linetype=2, size=0.1),
axis.text = element_text(size=12, face = "bold"),
axis.text.x = element_text(angle=30, hjust=1)
)
It's a lot of steps, but I break it down like that basically every time. When plot code gets large, I break up the chunks much in that manner above to help clear my mind on how to create the plot.

Change space between bars in histogram - R

I created a histogram in RStudio with the following code:
ggplot(data_csv, aes(x=Phasenew, fill=Success)) +
geom_histogram(binwidth = 1, position = "dodge", color="white")
What I want to do now, is to add more space between the bars of the histgram. I already tried the "width" parameter, but that one obviously does not work in histogram. Also I tried to make the outline bigger in white, but this will not show the correct length of the bar. Does anyone has an idea how to do that?
As two people wrote already in the comments, I also feel that your attempt to change the space between the 'bar' of a histogram is based on a misunderstanding about the nature of a histogram. Here the frequency of your events is represented as areas of the cells in the histogram. Or to quote Wikipedia:
the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval
A priori these cells do not even need to have the same width (in the case your class widths would differ).
Perhaps what you are looking for is geom_bar (https://ggplot2.tidyverse.org/reference/geom_bar.html)
ggplot(data_csv, aes(x=Phasenew, fill=Success)) +
geom_bar()

ggplot draw multiple plots by levels of a variable

I have a sample dataset
d=data.frame(n=rep(c(1,1,1,1,1,1,2,2,2,3),2),group=rep(c("A","B"),each=20),stringsAsFactors = F)
And I want to draw two separate histograms based on group variable.
I tried this method suggested by #jenesaisquoi in a separate post here
Generating Multiple Plots in ggplot by Factor
ggplot(data=d)+geom_histogram(aes(x=n,y=..count../sum(..count..)),binwidth = 1)+facet_wrap(~group)
It did the trick but if you look closely, the proportions are wrong. It didn't calculate the proportion for each group but rather a grand proportion. I want the proportion to be 0.6 for number 1 for each group, not 0.3.
Then I tried dplyr package, and it didn't even create two graphs. It ignored the group_by command. Except the proportion is right this time.
d%>%group_by(group)%>%ggplot(data=.)+geom_histogram(aes(x=n,y=..count../sum(..count..)),binwidth = 1)
Finally I tried factoring with color
ggplot(data=d)+geom_histogram(aes(x=n,y=..count../sum(..count..),color=group),binwidth = 1)
But the result is far from ideal. I was going to accept one output but with the bins side by side, not on top of each other.
In conclusion, I want to draw two separate histograms with correct proportions calculated within each group. If there is no easy way to do this, I can live with one graph but having the bins side by side, and with correct proportions for each group. In this example, number 1 should have 0.6 as its proportion.
By changing ..count../sum(..count..) to ..density.., it gives you the desired proportion
ggplot(data=d)+geom_histogram(aes(x=n,y=..density..),binwidth = 1)+facet_wrap(~group)
You actually have the separation of charts by variable correct! Especially with ggplot, you sometimes need to consider the scales of the graph separately from the shape. Facet_wrap applies a new layer to your data, regardless of scale. It will behave the same, no matter what your axes are. You could also try adding scale_y_log10() as a layer, and you'll notice that the overall shape and style of your graph is the same, you've just changed the axes.
What you actually need is a fix to your scales. Understandable - frequency plots can be confusing. ..count../sum(..count..)) treats each bin as an independent unit, regardless of its value. See a good explanation of this here: Show % instead of counts in charts of categorical variables
What you want is ..density.., which is basically the count divided by the total count. The difference is subtle in principle, but the important bit is that the value on the x-axis matters. For an extreme case of this, see here: Normalizing y-axis in histograms in R ggplot to proportion, where tiny x-axis values produced huge densities.
Your original code will still work, just substituting the aesthetics I described above.
ggplot(data=d)+geom_histogram(aes(x=n,y=..density..,)binwidth = 1)+facet_wrap(~group)
If you're still confused about density, so are lots of people. Hadley Wickham wrote a long piece about it, you can find that here: http://vita.had.co.nz/papers/density-estimation.pdf

ggplot2: Why symbol sizes differ when 'size' is including inside vs outside aes statement?

I have created quite a few maps using base-R but I am now trying to perform similar tasks using ggplot2 due to the ease by which multiple plots can be arranged on a single page. Basically, I am plotting the locations at which samples of a particular species of interest have been collected and want the symbol size to reflect the total weight of the species collected at that location. Creating the base map and various layers has not been an issue but I'm having trouble getting the symbol sizes and associated legend the way I want them.
The problem is demonstrated in the workable example below. When I include 'size' outside of aes, the symbol sizes appear to be scaled appropriately (plot1). But when I put 'size' inside the aes statement (in order to get a legend) the symbol sizes are no longer correct (plot2). It looks like ggplot2 has rescaled the data. This should be a simple task so I am clearly missing something very basic. Any help understanding this would be appreciated.
library(ggplot2)
#create a very simple dataset that includes locations and total weight of samples collected from each site
catch.data<-data.frame(long=c(-50,-52.5,-52,-54,-53.8,-52),
lat=c(48,54,54,55,52,50),
wt=c(2,38,3,4,25,122))
#including 'size' outside of aes results in no legend
#but the symbol sizes are represented correctly
plot1<-ggplot(catch.data,aes(x=long,y=lat)) +
geom_point(size=catch.data$wt,colour="white",fill="blue",shape=21)
#including 'size' within aes appears necessary in order to create a legend
#but the symbol sizes are not represented correctly
plot2<-ggplot(catch.data,aes(x=long,y=lat)) +
geom_point(aes(size=catch.data$wt),colour="white",fill="blue",shape=21)
First, you shouldn't reference the data frame name inside of aes, it messed the legend up. So the correct version will be
plot3 <- ggplot(catch.data,aes(x=long,y=lat)) +
geom_point(aes(size=wt),colour="white",fill="blue",shape=21)
Now in order to demonstrate variety you should play around with the range argument of scale_size_continuous, e.g.
plot3 + scale_size_continuous(range = range(catch.data$wt) / 5)
Change it a few times and see which one works for you. Please note that there exists a common visualization pitfall of representing numbers as areas (google e.g. "why pie charts are bad").
Edit: answering the comment below, you could introduce a fixed scaling by e.g.
scale_size_continuous(limits = c(1, 200), range = c(1, 20)).
Any value within the aes() is mapped to the variables in the data, while that is not the case for values specified outside the aes()
Refer to Difference between passing options in aes() and outside of it in ggplot2
Also the documentation : http://ggplot2.tidyverse.org/reference/aes.html

varying stat_binhex() size in ggplot2

I'm trying to use the stat_binhex() in ggplot2 to drop hex tiles on a plot, and the automatic settings vary the color of the bins, depending on count. That is, all the hexes are the same size, but have different colors.
I want to vary the size of the hex symbol itself! so that some are bigger than others... and i also want to vary color based on a third variable. I read through the documentation of ggplot2 and couldn't find any way to do this. The *hexbin* package has an option like this (lattice) but its plot() functions are maddening, so I was hoping to stay in ggplot2. Any other suggestions would be extremely helpful, as well.
If you know Kirk Goldsberry's NBA shot charts on Grantland, that's very similar to what I'd like to accomplish with my dataset.

Resources