ggplot stat_bin2d plot with heavily skewed data

ggplot stat_bin2d plot with heavily skewed data - r

I have a set of data that is heavily right skewed. This creates a problem when doing a stat_bin2d plot. The result is most of the graph is dark blue with only a few points are a different color. I'd like to have the graph use the entire color range a bit more.
An example of the problem is from the ggplot documentation direction.
ggplot(diamonds, aes(carat, price)) + stat_bin2d()
The resulting graph has only a few positions that are something other than dark blue.
How can I adjust the mapping of the color range to show more detail? I know I can set the limits, but this doesn't exactly fit the bill as it makes anything outside the limits be gray.
ggplot(diamonds, aes(carat, price)) + stat_bin2d() + scale_fill_gradient(limits=c(1, 100))
Something like this with they gray appropriately colored too.

The quick answer is
ggplot(diamonds, aes(carat, price)) + stat_bin2d() +
scale_fill_gradient(trans="log10")
EDIT:
A longer answer is that you probably want some kind of transformation of the color or fill scale. For built-in transformations refer to the "See Also" section of
library(scales)
?trans
If none of the built-in transformation is suitable then you can construct your own. See the answers to this SO question about transforming color scales for an example showing how to do this.

Related

Adding color disrupts the transitions in a scatterplot

If I create a simple scatterplot and add an animation, I get a beautiful "smooth" transition between states
library(tidyverse)
library(gganimate)
data("diamonds")
ggplot(diamonds) +
aes(carat, price) +
geom_point() +
transition_states(clarity)
But if I want to differentiate the points by color, the "smooth" transition is now lost.
ggplot(diamonds) +
aes(carat, price, color = clarity) +
geom_point() +
transition_states(clarity)
Why does this happen? How can I put it back? (I tried to add eases, but with no result)

To illustrate the comment from #teunbrand, read the object permanence section of the vignette and you will see the same thing. The TL;DR is to specify the group= aesthetic to apply to the entire dataset - like group=1.
ggplot(diamonds) +
aes(carat, price, color=clarity, group=1) +
geom_point() +
transition_states(clarity)
The reason it happens (practically-speaking) is that the smoothing for the animations happens using the group= aesthetic. Without defining a group= aesthetic, all observations belong to the same "group". The transition states specify clarity, so the animations smooth for all observations along the clarity column - transitioning from one clarity set to another.
When you define color= or other aesthetic across this discrete value, ggplot2 plots by cutting the data into groups according to the color specification. Effectively, specifying color=clarity also defines group=clarity. The animation is then made by smoothing from one value of clarity to the next within each group. Since every group contains only one value for clarity... you're smoothing from one transition state to an identical one and get no animation.
Therefore, the fix is to define for gganimate that even though you cut the data into groups for color, the group= aesthetic should still be defined across the entire dataset. You can do this by setting the group= aesthetic to any value. Here, I'm using group=1, but you can literally assign that to anything unchanging: group="happyfuntime" will work just the same.

ggplot2 + scatterplot + geom_path

Do you know how to get the curved effect Jake Kaupp achieves on his plot?

Looks to be something along the lines of:
ggplot(full_data, aes(y = total_consumption_lbs, x = milk_production_lbs)) +
geom_xspline2(aes(s_open = TRUE, s_shape = 0.5))
Where geom_xspline2() comes from library(ggalt)
But don't ask me, here is his source code:
https://github.com/jkaupp/tidytuesdays/blob/master/2019/week5/R/analysis.R

This approach doesn't look quite as nice as your example, but it's a start, and some fiddling may get you the rest of the way.
First, some data to work with:
x <- seq(1:20)
y <- jitter(x,amount=1.5)
df <- data.frame(x,y)
The approach using ggplot2 is to draw a geom_smooth with very small span (small enough to cause lots of errors, as you'll see), and then plot points with white borders over the top of that.
ggplot(df, aes(x,y)) +
geom_smooth(se=F, colour="black", span=0.15) +
geom_point(fill="black", colour="white", shape=21, size=2.5) +
theme_minimal()
The downsides: As I noted above, you'll see many errors about singularities in the loess fit, because the span is so small. Second, you'll note that not all of the points are centred on the line, which makes sense since you are using a loess fit for the line. Lastly, there doesn't appear to be a way to change the width of the line around the points, so you end up with quite a thin white border.

Removing all colors from a ggplot2 linechart

Working with RStudio 0.98.1103, I am creating two versions of exactly the same graph: One with colors and one without. Since both graphs are exactly the same (apart from the coloring) I want to avoid typing nearly the same commands again. Hence, I create the colored plot, save it, manipulate it to make it black-grey-white and save the reduced version:
library(ggplot2)
bp <- ggplot(data=PlantGrowth, aes(x=group, y=weight)) +
geom_line(aes(color=group)) + theme(legend.position="none")
bp_bw <- bp + theme_bw() +
geom_line() + theme(legend.position="none")
ggsave("bp_bw.png", bp_bw)
Although bp looks quite normal, bp_bw doesn't. There is still a blury color shining behind the black bars (red - green - blue):
Closeup:
How can I get rid of this colors, i.e. remove all color completely from bp? Only restriction: I have to create the colored graphs first (although of course a different order would work).

I think a better solution is to create a base and only add the coloring part when needed:
bp <- ggplot(data=PlantGrowth, aes(x=group, y=weight)) +
theme_bw() + theme(legend.position="none")
bp_col <- bp + geom_line(aes(color=group))
bp_bw <- bp + geom_line()

This (more-or-less) makes sense. Your bp_bw code doesn't get rid of the old colored lines, it just adds black lines on top. Anti-aliasing as the image is displayed/saved lets some of the color through on the edges.
My recommendation is to modify the color scale rather than overplot black on top:
bp_bw2 = bp + scale_color_manual(values = rep("black", 20)) + theme_bw()
This will change the colors to all black rather than plotting black on top of colors. The rep("black", 20) is kind of a hack. Apparently values aren't recycled by scale_color_manual, but extra values aren't used so you need to give it a vector at least as long as the number of colors.
This also has the advantage of not needing to repeat the geom call, and if you had previously defined a color scale this will overwrite it. If you want to be more general you could also add a scale_fill_manual(), and you probably want to specify guide = FALSE so that you don't get a very unhelpful legend.
You also might want to check out scale_colour_grey, just because it's B&W doesn't mean all the colors have to be the same.

Overriding default colours in a ggplot diagram

I'm fairly new to R and am trying to change the colours of my generated diagram.
p = ggplot(plasma1, aes(x=Day, y=Control, colour=Supp))
+ theme(panel.background = element_rect(fill='white', colour='black'))
+ geom_point(size=2, shape=21)
+ geom_errorbar(aes(ymin=Control-SEMcontrol, ymax=Control+SEMcontrol), width=1)
p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
+ xlab("X") + ylab("Y") + geom_line(linetype="dashed")
I've looked at themes() but I can only seem to change the grid lines and not the trend line. (Ideally I would like to change red & blue to red & black

Note: The convention on SO is that "answers" are reserved for reproducible code that demonstrates a solution. Anything less (like a suggestion) belongs in a comment. This is why it is so essential that questioners provide their data as part of the question; otherwise we have to make some up for you, which most people are (justifiably) unwilling to do.
The answer you asked for is below, but before getting into that you should be aware that ggplot's default color scheme is carefully chosen, so you should only change it if there is a good reason. The problem is that human evolution has caused certain colors (like red) to get a perceptual boost relative to other colors. So if you have a red curve and a black curve, the red curve leaves a stronger "impression". This fact is used extensively in certain fields (like advertising) to psychologically manipulate the viewer, but it has no place in scientific data visualization. The ggplot defaults, which are based on the HCL color system (which in turn is based on the Munsell color system), try to achieve two objectives: to create a color palette where each color is maximally distinguishable from all the other colors, and to even out the relative perceptual impact. There is a fairly technical discussion of this topic here, and some nice examples here.
Bottom line: don't change the colors unless you have a really good reason to do so.
Having said all that, the simple answer to your question is to use scale_color_manual(...), as below:
# all this to set up the example - you have this already
set.seed(1) # for reproducible example
x <- rep(c(1,2,4,8,11,14), each=5)
df1 <- data.frame(Day=x,Control=125*(1-exp(-x/5))+rnorm(30,sd=25),Supp="N")
df2 <- data.frame(Day=x,Control=90*(1-exp(-x/3))+rnorm(30,sd=25),Supp="C")
plasma1 <- aggregate(Control~Day+Supp,rbind(df2, df1), FUN=function(x)c(Control=mean(x),SEMcontrol=sd(x)/sqrt(length(x))))
plasma1 <- data.frame(plasma1[,1:2],plasma1[[3]])
# you start here
library(ggplot2)
ggp <- ggplot(plasma1, aes(x=Day, y=Control, color=Supp))+
geom_point(size=3, shape=21)+
geom_line(linetype="dashed")+
geom_errorbar(aes(ymax=Control+SEMcontrol, ymin=Control-SEMcontrol), width=0.3)+
theme_bw()+theme(panel.grid=element_blank())
ggp + scale_color_manual(values=c(C="red",N="black"))
Which produces this:
As mentioned in one of the comments, you could also use one of the Brewer Palettes developed by Prof. Cynthia Brewer at Penn State. These were originally intended for cartographic applications, but have become widely used generally in scientific visualization.
ggp + scale_color_brewer(palette="Set1")

ggplot2: plotting two size aesthetics

From what I can find on stackoverflow, (such as this answer to using two scale colour gradients on one ggplot) this may not (yet) be possible with ggplot2.
I want to create a bubbleplot with two size aesthetics, one always larger than the other. The idea is to show the proportion as well as the absolute values. Now I could colour the points by the proportion but I prefer multi-bubbles. In Excel this is relatively simple. (http://i.stack.imgur.com/v5LsF.png) Is there a way to replicate this in ggplot2 (or base)?

Here's an option. Mapping size in two geom_point layers should work. It's a bit of a pain getting the sizes right for bubblecharts in ggplot though.
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point(aes(size = disp), shape = 1) +
geom_point(aes(size = hp/(2*disp))) + scale_size_continuous(range = c(15,30))
To get it looking most like your exapmle, add theme_bw():
P <- p + theme_bw()
The scale_size_continuous() is where you have to just fiddle around till you're happy - at least in my experience. If someone has a better idea there I'd love to hear it.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

ggplot stat_bin2d plot with heavily skewed data - r

Related

Adding color disrupts the transitions in a scatterplot

ggplot2 + scatterplot + geom_path

Removing all colors from a ggplot2 linechart

Overriding default colours in a ggplot diagram

ggplot2: plotting two size aesthetics

Categories

Resources