one of my box plots is being split into 2? - r

Call data
mydata = JRdata
Make ggplot/ reorder based on medians
p <- ggplot(mydata, aes(x= reorder((as.factor(female)),occlusion20, FUN = median, na.rm = TRUE), y = occlusion20, fill = FamilySelectionMetagenomics,
colour = FamilySelectionMetagenomics)) + geom_boxplot() +
scale_fill_manual(values = c("white","red")) +
scale_colour_manual(values = c("black", "black"))

Related

Is there any way to use 2 color scales on the same ggplot?

With the data separated by categories (Samples A and B), 2 layers were made, one for points and one for lines. I want to separate my data by category indicating colors for the points and also separate the lines but with different colors than those used for the points.
library(ggplot2)
Sample <- c("a", "b")
Time <- c(0,1,2)
df <- expand.grid(Time=Time, Sample = Sample)
df$Value <- c(1,2,3,2,4,6)
ggplot(data = df,
aes(x = Time,
y = Value)) +
geom_point(aes(color = Sample)) +
geom_line(aes(color = Sample)) +
scale_color_manual(values = c("red", "blue")) + #for poits
scale_color_manual(values = c("orange", "purple")) #for lines
Making use of the ggnewscale package this could be achieved like so:
library(ggplot2)
library(ggnewscale)
Sample <- c("a", "b")
Time <- c(0,1,2)
df <- expand.grid(Time=Time, Sample = Sample)
df$Value <- c(1,2,3,2,4,6)
ggplot(data = df,
aes(x = Time,
y = Value)) +
geom_point(aes(color = Sample)) +
scale_color_manual(name = "points", values = c("red", "blue")) + #for poits
new_scale_color() +
geom_line(aes(color = Sample)) +
scale_color_manual(name = "lines", values = c("orange", "purple")) #for lines
Using colour columns and scale_color_identity:
df$myCol1 <- rep(c("red", "blue"), each = 3)
df$myCol2 <- rep(c("orange", "purple"), each = 3)
ggplot(data = df,
aes(x = Time,
y = Value)) +
geom_point(aes(color = myCol1)) +
geom_line(aes(color = myCol2)) +
scale_color_identity()

geom_histogram with proportions and factor data

I'm trying to consistently plot histograms for zonal statistics from a thematic map. The data within a single zone often looks something like this:
dat <- data.frame("CLASS" = sample(LETTERS[1:6], 250, replace = TRUE,
prob = c(.15, .06, .35, .4, .02, 0)))
dat$CLASS <- factor(dat$CLASS, levels = LETTERS[1:6], ordered = T)
wherein not all possible classes may have been present in the zone.
I can pre-compute the data summary and use geom_bar and a manual colour scale to get consistent bar colours regardless of missing data:
library(dplyr)
library(ggplot2)
library(viridis)
dat_summ <- dat %>%
group_by(CLASS, .drop = FALSE) %>%
summarise(percentage = n() / nrow(.) * 100)
mancols <- viridis_pal()(6)
names(mancols) <- LETTERS[1:6]
ggplot(dat_summ) +
geom_bar(aes(x = CLASS, y = percentage, fill = CLASS),
stat = 'identity', show.legend = FALSE) +
scale_x_discrete(drop = FALSE) +
scale_fill_manual(values = mancols, drop = FALSE) +
labs(x = 'Class', y = 'Percent') +
theme_minimal() +
theme(panel.grid.minor = element_blank())
But I can't keep the colours consistent across plots when I try to use geom_histogram:
ggplot(dat) +
geom_histogram(aes(x = CLASS,
y = (..count../sum(..count..)) * 100,
fill = ..x..), stat = 'count', show.legend = FALSE) +
scale_x_discrete(drop = FALSE) +
scale_fill_viridis_c() +
labs(x = 'Class', y = 'Percent') +
theme_minimal() +
theme(panel.grid.minor = element_blank())
If any of the outside-edge columns (A, F) are count = 0, the colours rescale to where data is present. This doesn't happen if there's a gap in one of the middle classes. Using scale_fill_viridis_b() doesn't solve the problem - it always rescales the palette against the number of non-0 columns.
Is it possible to prevent this behaviour and output consistent colours no matter which columns are count = 0, or am I stuck with my geom_bar approach?
Maybe scale_fill_discrete/scale_fill_viridis_d(drop = F) is what you want (with fill = CLASS).
ggplot(dat) +
geom_histogram(aes(x = CLASS,
y = (..count../sum(..count..)) * 100,
fill = CLASS), stat = 'count', show.legend = FALSE) +
scale_x_discrete(drop = FALSE) +
scale_fill_viridis_d(drop = FALSE) +
labs(x = 'Class', y = 'Percent') +
theme_minimal() +
theme(panel.grid.minor = element_blank())
I think that the problem is that you pass the calculated variable ..x.. to fill in the aesthetics. It appears the length of this variable changes with your data set. You could replace it with scale_fill_manual and you will get the same plot colours regardless of how many levels there are in your CLASS variable:
ggplot(dat) +
geom_histogram(aes(x = CLASS, y = stat(count/sum(count) * 100), fill = CLASS), stat = 'count', show.legend = FALSE) +
scale_x_discrete(drop = FALSE) +
scale_fill_manual(values = c("#FF0000FF", "#CCFF00FF", "#00FF66FF", "#0066FFFF", "#CC00FFFF", "#FF99FFFF"))
labs(x = 'Class', y = 'Percent') +
theme_minimal() +
theme(panel.grid.minor = element_blank())

adding a label in geom_line in R

I have two very similar plots, which have two y-axis - a bar plot and a line plot:
code:
sec_plot <- ggplot(data, aes_string (x = year, group = 1)) +
geom_col(aes_string(y = frequency), fill = "orange", alpha = 0.5) +
geom_line(aes(y = severity))
However, there are no labels. I want to get a label for the barplot as well as a label for the line plot, something like:
How can I add the labels to the plot, if there is only pone single group? is there a way to specify this manually? Until know I have only found option where the labels can be added by specifying them in the aes
EXTENSION (added a posterior):
getSecPlot <- function(data, xvar, yvar, yvarsec, groupvar){
if ("agegroup" %in% xvar) xvar <- get("agegroup")
# data <- data[, startYear:= as.numeric(startYear)]
data <- data[!claims == 0][, ':=' (scaled = get(yvarsec) * max(get(yvar))/max(get(yvarsec)),
param = max(get(yvar))/max(get(yvarsec)))]
param <- data[1, param] # important, otherwise not found in ggplot
sec_plot <- ggplot(data, aes_string (x = xvar, group = groupvar)) +
geom_col(aes_string(y = yvar, fill = groupvar, alpha = 0.5), position = "dodge") +
geom_line(aes(y = scaled, color = gender)) +
scale_y_continuous(sec.axis = sec_axis(~./(param), name = paste0("average ", yvarsec),labels = function(x) format(x, big.mark = " ", scientific = FALSE))) +
labs(y = paste0("total ", yvar)) +
scale_alpha(guide = 'none') +
theme_pubclean() +
theme(legend.title=element_blank(), legend.background = element_rect(fill = "white"))
}
plot.ExposureYearly <- getSecPlot(freqSevDataAge, xvar = "agegroup", yvar = "exposure", yvarsec = "frequency", groupvar = "gender")
plot.ExposureYearly
How can the same be done on a plot where both the line plot as well as the bar plot are separated by gender?
Here is a possible solution. The method I used was to move the color and fill inside the aes and then use scale_*_identity to create and format the legends.
Also, I needed to add a scaling factor for severity axis since ggplot does not handle the secondary axis well.
data<-data.frame(year= 2000:2005, frequency=3:8, severity=as.integer(runif(6, 4000, 8000)))
library(ggplot2)
library(scales)
sec_plot <- ggplot(data, aes(x = year)) +
geom_col(aes(y = frequency, fill = "orange"), alpha = 0.6) +
geom_line(aes(y = severity/1000, color = "black")) +
scale_fill_identity(guide = "legend", label="Claim frequency (Number of paid claims per 100 Insured exposure)", name=NULL) +
scale_color_identity(guide = "legend", label="Claim Severity (Average insurance payment per claim)", name=NULL) +
theme(legend.position = "bottom") +
scale_y_continuous(sec.axis =sec_axis( ~ . *1, labels = label_dollar(scale=1000), name="Severity") ) + #formats the 2nd axis
guides(fill = guide_legend(order = 1), color = guide_legend(order = 2)) #control which scale plots first
sec_plot

Colour average lines in ggplot

I would like to colour the dashed lines, which are the average values of the two respective categories, with the same colour of the default palette used by ggplot to fill the distributions:
Click here to view the distribution
This is the code used:
library(ggplot2)
print(ggplot(dati, aes(x=ECU_fuel_consumption_L_100Km_CF, fill=Model))
+ ggtitle("Fuel Consumption density histogram, by Model")
+ ylab("Density")
+ geom_density(alpha=.3)
+ scale_x_continuous(breaks=pretty(dati$ECU_fuel_consumption_L_100Km_CF, n=10))
+ geom_vline(aes(xintercept = mean(ECU_fuel_consumption_L_100Km_CF[dati$Model == "500X"])), linetype="dashed", size=1)
+ geom_vline(aes(xintercept = mean(ECU_fuel_consumption_L_100Km_CF[dati$Model == "Renegade"])), linetype="dashed", size=1)
)
Thank you all in advance!
No reproducible example, but you probably want to do something like this:
library(dplyr)
# make up some data
d <- data.frame(x = c(mtcars$mpg, mtcars$hp),
var = rep(c('mpg', 'hp'), each = nrow(mtcars)))
means <- d %>% group_by(var) %>% summarize(m = mean(x))
ggplot(d, aes(x, fill = var)) +
geom_density(alpha = 0.3) +
geom_vline(data = means, aes(xintercept = m, col = var),
linetype = "dashed", size = 1)
This approach is extendable to any number of groups.
An option that doesn't require pre-calculation, but is also a bit more hacky, is:
ggplot(d, aes(x, fill = var)) +
geom_density(alpha = 0.3) +
geom_vline(aes(col = 'hp', xintercept = x), linetype = "dashed", size = 1,
data = data.frame(x = mean(d$x[d$var == 'hp']))) +
geom_vline(aes(col = 'mpg', xintercept = x), linetype = "dashed", size = 1,
data = data.frame(x = mean(d$x[d$var == 'mpg'])))

showing scale_fill_manual from scratch

I want to show histograms of multiple groups where the values do not stack. I do this by:
dat <- data.frame(x = seq(-3, 3, length = 20))
dat$y <- dnorm(dat$x)
dat$z <- dnorm(dat$x, mean = 2)
p <- ggplot(dat, aes(x = x)) +
geom_bar(aes(y = y), stat = "identity", alpha = .5, fill = "red") +
geom_bar(aes(y = z), stat = "identity", alpha = .5, fill = "blue")
I'd like to have a fill legend that shows the groupings. I'm not sure why this does not produce any legend (or error):
p + scale_fill_manual(values = c(x = "red", z = "blue"),
limits = c("mean 0", "mean 2")) +
guides(fill=guide_legend(title.position="top"))
Using unnamed values produces the same result.
Thanks,
Max
The legend is automatically generated only if you map fill to variable using aes, like so:
library(reshape2)
ggplot(melt(dat, "x"), aes(x = x)) +
geom_bar(aes(y = value, fill = variable),
stat = "identity", position = "identity", alpha = .5) +
scale_fill_manual(values = c(y = "red", z = "blue"),
labels = c("mean 0", "mean 2"))

Resources