Change alpha value for certain break values in ggplot geom_point - r

I have made a scatter plot from 100k++ points and i would like the colour points (break values 1 and 2 which are "green" and break value 20 which is "red") to stand out more than the "cornsilk1" points (break values 3 to 19). I have tried the code below but no luck.
Any help would be appreciated.
Thanks so much
p.s. please excuse my juvenile code. I am sure there is a way more effective way to do this...
plotIA<-ggplot(plotintaobs,aes(x=SD13009PB,y=SD13009PB2,colour=quartile))+geom_point()+labs(x="Phillips Observeration 1", y="Phillips Observation 2") + ggtitle("Intra-observer Variation") + mytheme
plotIA+ scale_color_manual(breaks = c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20"),
values=c("green","green", "cornsilk1", "cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","red"))
plotIA+scale_alpha_manual(values=c(1,1,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,1))

One strategy is to use cut to split the quartiles into into your three groups. Then you can use scale_colour_manual
# some fake data
plotintaobs <- data.frame(SD13009PB = rnorm(20), SD13009PB2 = rnorm(20), quartile = 1:20)
#cut quartile
plotintaobs$q2 <- cut(plotintaobs$quartile, breaks = c(0, 2, 19, 20), labels = c("low", "mid", "high"))
#plot
plotIA <- ggplot(plotintaobs, aes(x = SD13009PB, y = SD13009PB2, colour = q2, alpha = q2)) +
geom_point() +
scale_colour_manual(values = c("green", "cornsilk1","red")) +
scale_alpha_manual(values = c(1, 0.8, 1))
plotIA

Related

Layering violin plots with geom_violin to compare distributions

I am trying to compare the distributions of a continuous variable across groups using violin plots. Pretty easy. However, I would like to make comparisons across distributions easier by showing the distribution for one of the groups (the reference) in grey with a low alpha value in the background. Something like this but with a violin plot:
My current approach plots the data twice. For the first geom_violin, I duplicate the data for the reference group and plot it in grey. For the second geom_violin, I use the actual data d. In this example, the two violin plots in grey and blue should look the same for the group "blue". However, they are NOT the same even though they are based on exactly the same data for group "blue".
How can I resolve this problem? Or is there another better approach to do this?
d <- tibble(
group = sample(c("green", "blue"), 1000, replace = TRUE, prob = c(0.7, 0.3)),
x = ifelse(group == "green", rnorm(1000, 1, 1), rnorm(1000, 0, 3))
)
dblue <- filter(d, group == "blue")
dblue <- bind_rows(dblue, mutate(dblue, group = "green"))
ggplot(d, aes(x = factor(group), y = x)) +
geom_violin(data = dblue, fill = alpha("#333333", 0.2), color = alpha("#333333", 0)) +
geom_violin(fill = alpha("#0072B2", 0.8), color = alpha("#0072B2", 0))
Add scale = "width" to the second geom_violin
ggplot(d, aes(x = factor(group), y = x)) +
geom_violin(data = dblue, fill = alpha("#333333", 0.2), color = alpha("#333333", 0)) +
geom_violin(fill = alpha("#0072B2", 0.8), color = alpha("#0072B2", 0),
scale = "width")

How to get different colors related to treatment for boxplot and violin plot (ggplot / using geom_split_violin) that are plotted in one?

I am trying to show a boxplot and a violin plot in one.
I can fill in the colors of the boxplot and violin plot based on the treatment. But, I don't want them in exactly the same color, I'd prefer the violin plot or the boxplot filling to be lighter.
Also, I am able to get the outer lines of the boxplot in different colors if I add col=TM to the aes of the geom_boxplot. But, then I can not choose these colors or don't know how to (they are now automatically pink and blue).
BACKGROUND:
I am working with a data set that looks something like this:
TM yax X Zscore
Org zscore zhfa -1.72
Org zscore zfwa -0.12
I am plotting the z-scores based on the X (zhfa e.d.) per treatment (TM).
#Colours
ocean = c('#BBDED6' , '#61C0BF' , '#FAE3D9' , '#FFB6B9' )
## Plot ##
z <- ggplot(data = data, aes(x = X, y = Zscore,fill=TM)) +
geom_split_violin(col="white", fill="white") +
geom_boxplot(alpha = 1, width=0.3, aes(fill=TM), position = position_dodge(width = 0.3))
z + theme(axis.text = element_text(size = 12),legend.position="top") +
stat_compare_means(method="t.test", label.y=2.8, label.x=0.3, size=3) +
scale_fill_manual(values=ocean, labels=c("Mineral fertilizer", "Organic fertilizer"))
Now, half of the violin plot is filled white, but not both (which would already be better). If I would plot geom_split_violin() it would get exactly the same colors as the boxplot.
Furthermore, should the violinplot of zhfa be on the left side but it get's switched and is displayed at the right side, while it matched the data of the organic (left) boxplot.
The graph now:
I don't know if it can be solved by adding something related to the scale_fill_manual or if this is an impossible request
Sample Data:
data <- data.frame(TM = c(rep("org", 5), rep("min", 5),rep("org", 5), rep("min", 5),rep("org", 5), rep("min", 5)),
Zscore = runif(30,-2,2),
X = c(rep("zwfa", 10), rep("zhfa", 10), rep("zbfa", 10)))
You can add an additional column to your data that is the same structure as TM but different values, then scale the fill:
Sample Data:
data <- data.frame(TM = c(rep("org", 5), rep("min", 5),rep("org", 5), rep("min", 5),rep("org", 5), rep("min", 5)),
Zscore = runif(30,-2,2),
X = c(rep("zwfa", 10), rep("zhfa", 10), rep("zbfa", 10)))
Begin solution:
data <- data %>% mutate(TMm = c(rep("orgM", 5), rep("minM", 5),rep("orgM", 5), rep("minM", 5),rep("orgM", 5), rep("minM", 5)))
#Colours
ocean = c('#BBDED6' , '#FAE3D9', '#61C0BF' , '#FFFFFF')
## Plot ##
z <- ggplot(data = data, aes(x = X, y = Zscore,fill=TM)) +
geom_split_violin(mapping = aes(fill=TMm)) +
geom_boxplot(alpha = 1, width=0.3, aes(fill=TM), position = position_dodge(width = 0.3))
z + theme(axis.text = element_text(size = 12),legend.position="top") +
stat_compare_means(method="t.test", label.y=2.8, label.x=0.3, size=3) +
scale_fill_manual(breaks = c("org", "min"), values=ocean, labels=c("Mineral fertilizer", "Organic fertilizer"))
In your data you may have to change breaks = c("org", "min") to whatever you call the factor levels in the TM variable
Or if you want the whole violin plot white:
ocean = c('#BBDED6' , '#FFFFFF', '#61C0BF' , '#FFFFFF')
New Plot:

two histograms in one plot (ggplot)

Well, I've been looking in this site to make two histograms in one plot.
I get to
ggplot()+geom_histogram(data=etapa1, aes(x=AverageTemperature),col="red")+
geom_histogram(data=etapa2, aes(x=AverageTemperature),col="blue")
I've got two histograms with different colours, but I don't get a legend or a label which shows which is each colour. How can I produce it?
As Spacedman said it would be better if you could specify your problem more in detail and give an example data set.
So i create a random sample set which simulates a temperature.
etapa1 <- data.frame(AverageTemperature = rnorm(100000, 16.9, 2))
etapa2 <- data.frame(AverageTemperature = rnorm(100000, 17.4, 2))
#Now, combine your two dataframes into one. First make a new column in each.
etapa1$e <- 'etapa1'
etapa2$e <- 'etapa2'
# combine the two data frames etapa1 and etapa2
combo <- rbind(etapa1, etapa2)
ggplot(combo, aes(AverageTemperature, fill = e)) + geom_density(alpha = 0.2)
For me it seems more obvious to use a density plot rather than a histogram since temperatures are real numbers.
Hope this helps somehow...
If you don't want to combine the two data.frames it is a bit more tricky...
You have to use scale_colour_manual and scale_fill_manual. And then define a variable for the fill statement. This can be linked in the labels section
ggplot() +
geom_density(data = etapa1, aes(x = AverageTemperature, fill = "r"), alpha = 0.3) +
geom_density(data = etapa2, aes(x = AverageTemperature, fill = "b"), alpha = 0.3) +
scale_colour_manual(name ="etapa", values = c("r" = "red", "b" = "blue"), labels=c("b" = "blue values", "r" = "red values")) +
scale_fill_manual(name ="etapa", values = c("r" = "red", "b" = "blue"), labels=c("b" = "blue values", "r" = "red values"))
You can replace geom_density() with geom_histogram() respectively.
Using #TimoWagner's example:
set.seed(1001)
etapa1 <- data.frame(AverageTemperature = rnorm(100000, 16.9, 2))
etapa2 <- data.frame(AverageTemperature = rnorm(100000, 17.4, 2))
Here's another way to pack the two data sets together:
combdat <- dplyr::bind_rows(list(dat1=etapa1,dat2=etapa2),
.id="dataset")
Two superimposed histograms:
library(ggplot2)
ggplot(combdat,aes(AverageTemperature,fill=dataset))+
scale_fill_manual(values=c("red","blue"))+
geom_histogram(alpha=0.5,binwidth=0.1,position="identity")

How to map ggplot histogram x-axis intervals to fixed colour palette?

I am trying to stratify my ggplot2 histogram into fixed intervals and colour them based on a specific colour palette: 'x<4':black; '4<x<6':blue; '6<x<8':yellow; and so on...
I tried 2 ways, both of which didn't work.
Referring to my code below, alternative 1 fails when NoOfElement falls to a small figure, say, 500, and there is no element in the first interval 'x<4'. ggplot2 then assigns 'black' to whichever is the first interval (this would be '4<x<6' when size=500). But this is not what I want (see picture).
In Alternative 2, I created another variable in my data frame and assigned the colours for each element. I did this based on a modification of the solution given in: Set specific fill colors in ggplot2 by sign. Unfortunately, the resulting histogram has colours randomly assigned by ggplot2.
I'm quite stuck and would really appreciate some help. Thanks in advance!
Sample code:
library(ggplot2)
NoOfElement <- 5000; MyBreaks <- c(-Inf, seq(4, 16, by=2), Inf)
MyColours <- c("black", "blue", "yellow", "green", "gray", "brown", "purple", "red")
set.seed(2)
c <- data.frame(a=rnorm(NoOfElement, 10, 2), b=rep(NA, NoOfElement))
c$b <- cut(c$a, MyBreaks)
try <- 1 # Allows toggling of alternatives below
if (try==1)
{
p <- ggplot( c, aes(x=c$a, fill=c$b) ) + geom_histogram( binwidth=0.2 ) +
scale_fill_manual(breaks = levels(c$b), values = MyColours,
name = "X Intervals") +
scale_x_continuous( limits=c(2, 20))
}else
{
c$BarCol <- factor(c$b, levels = levels(c$b), labels = MyColours)
p <- ggplot( c, aes(x=c$a, fill=c$b) ) + geom_histogram( binwidth=0.2 ) +
scale_fill_manual(values = c$BarCol, name = "X Intervals") +
scale_x_continuous( limits=c(2, 20))
}
plot (p)
There's a drop argument in scale_ family for empty levels:
NoOfElement <- 500; MyBreaks <- c(-Inf, seq(4, 16, by=2), Inf)
MyColours <- c("black", "blue", "yellow", "green", "gray", "brown", "purple", "red")
set.seed(2)
c <- data.frame(a=rnorm(NoOfElement, 10, 2), b=rep(NA, NoOfElement))
c$b <- cut(c$a, MyBreaks)
p <- ggplot( c, aes(x=c$a, fill=c$b) ) + geom_histogram( binwidth=0.2 ) +
scale_fill_manual(breaks = levels(c$b), values = MyColours,
name = "X Intervals", drop=FALSE)
Related question here.

Plotting baseball pitches as qualitative variable by color

I was thinking of doing this in R but am new to it and would appreciate any help
I have a dataset (pitches) of baseball pitches identified by
'pitchNumber' and 'outcome' e.g S = swinging strike, B = ball, H= hit
etc.
e.g.
1 B ;
2 H ;
3 S ;
4 S ;
5 X ;
6 H; etc.
All I want to do is have a graph that plots them in a line cf BHSSXB
but replacing the letter with a small bar colored to represent the letter, with a legend, and optionally having the pitch number above the color . Somewhat like a sparkline.
Any suggestion on how to implement this much appreciated
And the same graph using ggplot.
Data courtesy of #GavinSimpson.
ggplot(baseball, aes(x=pitchNumber, y=1, ymin=0, ymax=1, colour=outcome)) +
geom_point() +
geom_linerange() +
ylab(NULL) +
xlab(NULL) +
scale_y_continuous(breaks=c(0, 1)) +
opts(
panel.background=theme_blank(),
panel.grid.minor=theme_blank(),
axis.text.y = theme_blank()
)
Here is a base graphics idea from which to work. First some dummy data:
set.seed(1)
baseball <- data.frame(pitchNumber = seq_len(50),
outcome = factor(sample(c("B","H","S","S","X","H"),
50, replace = TRUE)))
> head(baseball)
pitchNumber outcome
1 1 H
2 2 S
3 3 S
4 4 H
5 5 H
6 6 H
Next we define the colours we want:
## better colours - like ggplot for the cool kids
##cols <- c("red","green","blue","yellow")
cols <- head(hcl(seq(from = 0, to = 360,
length.out = nlevels(with(baseball, outcome)) + 1),
l = 65, c = 100), -1)
then plot the pitchNumber as a height 1 histogram-like bar (type = "h"), suppressing the normal axes, and we add on points to the tops of the bars to help visualisation:
with(baseball, plot(pitchNumber, y = rep(1, length(pitchNumber)), type = "h",
ylim = c(0, 1.2), col = cols[outcome],
ylab = "", xlab = "Pitch", axes = FALSE, lwd = 2))
with(baseball, points(pitchNumber, y = rep(1, length(pitchNumber)), pch = 16,
col = cols[outcome]))
Add on the x-axis and the plot frame, plus a legend:
axis(side = 1)
box()
## note: this assumes that the levels are in alphabetical order B,H,S,X...
legend("topleft", legend = c("Ball","Hit","Swinging Strike","X??"), lty = 1,
pch = 16, col = cols, bty = "n", ncol = 2, lwd = 2)
Gives this:
This is in response to your last comment on #Gavin's answer. I'm going to build off of the data provided by #Gavin and the ggplot2 plot by #Andrie. ggplot() supports the concept of faceting by a variable or variables. Here you want to facet by pitcher and at the pitch limit of 50 per row. We'll create a new variable that corresponds to each row we want to plot separately. The equivalent code in base graphics would entail adjusting mfrow or mfcol in par() and calling separate plots for each group of data.
#150 pitches represents a somewhat typical 9 inning game.
#Thanks to Gavin for sample data.
longGame <- rbind(baseball, baseball, baseball)
#Starter goes 95 pitches, middle relief throws 35, closer comes in for 20 and the glory
longGame$pitcher <- c(rep("S", 95), rep("M", 35), rep("C",20))
#Adjust pitchNumber accordingly
longGame$pitchNumber <- c(1:95, 1:35, 1:20)
#We want to show 50 pitches at a time, so will combine the pitcher name
#with which set of pitches this is
longGame$facet <- with(longGame, paste(pitcher, ceiling(pitchNumber / 50), sep = ""))
#Create the x-axis in increments of 1-50, by pitcher
longGame <- ddply(longGame, "facet", transform, pitchFacet = rep(1:50, 5)[1:length(facet)])
#Convert facet to factor in the right order
longGame$facet <- factor(longGame$facet, levels = c("S1", "S2", "M1", "C1"))
#Thanks to Andrie for ggplot2 function. I change the x-axis and add a facet_wrap
ggplot(longGame, aes(x=pitchFacet, y=1, ymin=0, ymax=1, colour=outcome)) +
geom_point() +
geom_linerange() +
facet_wrap(~facet, ncol = 1) +
ylab(NULL) +
xlab(NULL) +
scale_y_continuous(breaks=c(0, 1)) +
opts(
panel.background=theme_blank(),
panel.grid.minor=theme_blank(),
axis.text.y = theme_blank()
)
You can obviously change the labels for the facet variable, but the above code will produce:

Resources