Is it possible to enforce the stack order when using geom_area()? I cannot figure out why geom_area(position = "stack") produces this strange fluctuation in stack order around 1605.
There are no missing values in the data frame.
library(ggplot2)
counts <- read.csv("https://gist.githubusercontent.com/mdlincoln/d5e1bf64a897ecb84fd6/raw/34c6d484e699e0c4676bb7b765b1b5d4022054af/counts.csv")
ggplot(counts, aes(x = year, y = artists_strict, fill = factor(nationality))) + geom_area()
You need to order your data. In your data, the first value found for each year is 'Flemish' until 1605, and from 1606 the first value is 'Dutch'. So, if we do this:
ggplot(counts[order(counts$nationality),],
aes(x = year, y = artists_strict, fill = factor(nationality))) + geom_area()
It results in
Further illustration if we use random ordering:
set.seed(123)
ggplot(counts[sample(nrow(counts)),],
aes(x = year, y = artists_strict, fill = factor(nationality))) + geom_area()
As randy said, ggplot2 2.2.0 does automatic ordering. If you want to change the order, just reorder the factors used for fill. If you want to switch which group is on top in the legend but not the plot, you can use scale_fill_manual() with the limits option.
(Code to generate ggplot colors from John Colby)
gg_color_hue <- function(n) {
hues = seq(15, 375, length = n + 1)
hcl(h = hues, l = 65, c = 100)[1:n]
}
cols <- gg_color_hue(2)
Default ordering in legend
ggplot(counts,
aes(x = year, y = artists_strict, fill = factor(nationality))) +
geom_area()+
scale_fill_manual(values=c("Dutch" = cols[1],"Flemish"=cols[2]),
limits=c("Dutch","Flemish"))
Reversed ordering in legend
ggplot(counts,
aes(x = year, y = artists_strict, fill = factor(nationality))) +
geom_area()+
scale_fill_manual(values=c("Dutch" = cols[1],"Flemish"=cols[2]),
limits=c("Flemish","Dutch"))
Reversed ordering in plot and legend
counts$nationality <- factor(counts$nationality, rev(levels(counts$nationality)))
ggplot(counts,
aes(x = year, y = artists_strict, fill = factor(nationality))) +
geom_area()+
scale_fill_manual(values=c("Dutch" = cols[1],"Flemish"=cols[2]),
limits=c("Flemish","Dutch"))
this should do it for you
ggplot(counts[order(counts$nationality),],
aes(x = year, y = artists_strict, fill = factor(nationality))) + geom_area()
hope this helps
Related
I was wondering how I can scale geom_hex not on count, but rather by a variable and heat scale it? I am also having overfitting in my actual model and was wondering how to eliminate that? Here's an examples:
'''
ggplot(data = diamonds)+
geom_hex(mapping = aes(x = x, y = price, fill = depth, bins =
25))+
scale_fill_continuous(type = "viridis")
'''
Thanks!
I think this will do the trick, assuming you want to colour the hexagons according to the mean of depth...
ggplot(diamonds, aes(x = x, y = price, z = depth)) +
stat_summary_hex(fun = mean, bins = 25) +
scale_fill_continuous(type = "viridis")
I'm really new to R and I'm trying to plot data from air polution with NOx from 5 different locations (having a data of monthly averages from every location from 01-1996 to 12-2019). Each plot line should represent different location.
I've created a ggplot but I find it really unclear. I would like to ask you about your tips to make that plot better to read (It will be no bigger than A4, because it will be included in my work and printed). I would also like to have more years on X axis (1996, 1997, 1998)
ALIBA <- read_csv("ALIBA_Praha/NOx/all_sorted.csv")
BMISA <- read_csv("BMISA_Mikulov/NOx/all_sorted.csv")
CCBDA <- read_csv("CCBDA_CB/NOx/all_sorted.csv")
TKARA <- read_csv("TKARA_Karvina/NOx/all_sorted.csv")
UULKA <- read_csv("UULKA_UnL/NOx/all_sorted.csv")
ggplot() +
geom_line(data = ALIBA, aes(x = START_TIME, y = VALUE), color = "blue") +
geom_line(data = BMISA, aes(x = START_TIME, y = VALUE), color = "red") +
geom_line(data = CCBDA, aes(x = START_TIME, y = VALUE), color = "yellow") +
geom_line(data = TKARA, aes(x = START_TIME, y = VALUE), color = "green") +
geom_line(data = UULKA, aes(x = START_TIME, y = VALUE), color = "pink")
all csv files are in format:
START_TIME,VALUE
1996-01-01T00:00:00Z,61.3049451304964
1996-02-01T00:00:00Z,47.7234010245664
1996-03-01T00:00:00Z,33.083512309072
1996-04-01T00:00:00Z,47.771166691758
1996-05-01T00:00:00Z,24.7022422574005
1996-06-01T00:00:00Z,25.4495954480684
1996-07-01T00:00:00Z,23.301224242488
...
Thanks
First, I would paste all data sets together:
ALIBA <- read_csv("ALIBA_Praha/NOx/all_sorted.csv")
ALIBA$Location <- "ALIBA" # and so on
BMISA <- read_csv("BMISA_Mikulov/NOx/all_sorted.csv")
CCBDA <- read_csv("CCBDA_CB/NOx/all_sorted.csv")
TKARA <- read_csv("TKARA_Karvina/NOx/all_sorted.csv")
UULKA <- read_csv("UULKA_UnL/NOx/all_sorted.csv")
df <- rbind(ALIBA, BMISA, ...) # and so on
ggplot(data = df, aes(x = START_TIME, y = VALUE, color = Location) +
geom_line(size = 1) + # play with the stroke thickness
scale_color_brewer(palette = "Set1") + # here you can choose from a wide variety of palettes, just google
How would you like to add more years? In the same graph (everything will be tiny) or in seperate "windows" (= facets, better)?
I have a boxplot with multiple groups in R.
When i add the dots within the boxplots, they are not in the center.
Since each week has a different number of boxplots, the dots are not centered within the box.
The problem is in the geom_point part.
I uploaded my data of df.m in a text file and a figure of what i get.
I am using ggplot, and here is my code:
setwd("/home/usuario")
dput("df.m")
df.m = read.table("df.m.txt")
df.m$variable <- as.factor(df.m$variable)
give.n = function(elita){
return(c(y = median(elita)*-0.1, label = length(elita)))
}
p = ggplot(data = df.m, aes(x=variable, y=value))
p = p + geom_boxplot(aes(fill = Label))
p = p + geom_point(aes(fill = Label), shape = 21,
position = position_jitterdodge(jitter.width = 0))
p = p + stat_summary(fun.data = give.n, geom = "text", fun.y = median)
p
Here is my data in a text file:
https://drive.google.com/file/d/1kpMx7Ao01bAol5eUC6BZUiulLBKV_rtH/view?usp=sharing
Only in variable 12 is in the center, because there are 3 groups (the maximum of possibilities!
I would also like to show the counting of observations. If I use the code shown, I can only get the number of observations for all the groups. I would like to add the counting for EACH GROUP.
Thank you in advance
enter image description here
Here's a solution using boxplot and dotplot and an example dataset:
library(tidyverse)
# example data
dt <- data.frame(week = c(1,1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,2),
value = c(6.40,6.75,6.11,6.33,5.50,5.40,5.83,4.57,5.80,
6.00,6.11,6.40,7.00,3,5.44,6.00,5,6.00),
donor_type = c("A","A","A","A","CB","CB","CB","CB","CB",
"CB","CB","CB","CB","CB","A","A","A","A"))
# create the plot
ggplot(dt, aes(x = factor(week), y = value, fill = donor_type)) +
geom_boxplot() +
geom_dotplot(binaxis='y', stackdir='center', position = position_dodge(0.75))
You should be able to adjust my code to your real dataset easily.
Edited answer with OP's dataset:
Using some generated data and geom_point():
library(tidyverse)
df.m <- df.m %>%
mutate(variable = as.factor(variable)) %>%
filter(!is.na(value))
ggplot(df.m, aes(x = variable, y = value, fill = Label)) +
geom_boxplot() +
geom_point(shape = 21, position = position_jitterdodge(jitter.width = 0)) +
scale_x_discrete("variable", drop = FALSE)
Let's say I have the following data frame:
library(ggplot2)
set.seed(101)
n=10
df<- data.frame(delta=rep(rep(c(0.1,0.2,0.3),each=3),n), metric=rep(rep(c('P','R','C'),3),n),value=rnorm(9*n, 0.0, 1.0))
My goal is to do a boxplot by multiple factors:
p<- ggplot(data = df, aes(x = factor(delta), y = value)) +
geom_boxplot(aes(fill=factor(metric)))
The output is:
So far so good, but if I do:
p+ geom_point(aes(color = factor(metric)))
I get:
I do not know what it is doing. My goal is to color the outliers as it is done here. Note that this solution changes the inside color of the boxes to white and set the border to different colors. I want to keep the same color of the boxes while having the outliers inherit those colors. I want to know how to make the outliers get the same colors from their respective boxplots.
Do you want just to change the outliers' colour ? If so, you can do it easily by drawing boxplot twice.
p <- ggplot(data = df, aes(x = factor(delta), y = value)) +
geom_boxplot(aes(colour=factor(metric))) +
geom_boxplot(aes(fill=factor(metric)), outlier.colour = NA)
# outlier.shape = 21 # if you want a boarder
[EDITED]
colss <- c(P="firebrick3",R="skyblue", C="mediumseagreen")
p + scale_colour_manual(values = colss) + # outliers colours
scale_fill_manual(values = colss) # boxes colours
# the development version (2.1.0.9001)'s geom_boxplot() has an argument outlier.fill,
# so I guess under code would return the similar output in the near future.
p2 <- ggplot(data = df, aes(x = factor(delta), y = value)) +
geom_boxplot(aes(fill=factor(metric)), outlier.shape = 21, outlier.colour = NA)
Maybe this:
ggplot(data = df, aes(x = as.factor(delta), y = value,fill=as.factor(metric))) +
geom_boxplot(outlier.size = 1)+ geom_point(pch = 21,position=position_jitterdodge(jitter.width=0))
I'm hoping to use ggplot2 to generate a set of stacked bars in pairs, much like this:
With the following example data:
df <- expand.grid(name = c("oak","birch","cedar"),
sample = c("one","two"),
type = c("sapling","adult","dead"))
df$count <- sample(5:200, size = nrow(df), replace = T)
I would want the x-axis to represent the name of the tree, with two bars per tree species: one bar for sample one and one bar for sample two. Then the colors of each bar should be determined by type.
The following code generates the stacked bar with colors by type:
ggplot(df, aes(x = name, y = count, fill = type)) + geom_bar(stat = "identity")
And the following code generates the dodged bars by sample:
ggplot(df, aes(x = name, y = count, group = sample)) + geom_bar(stat = "identity", position = "dodge")
But I can't get it to dodge one of the groupings (sample) and stack the other grouping (type):
ggplot(df, aes(x = name, y = count, fill = type, group = sample)) + geom_bar(stat = "identity", position = "dodge")
One workaround would be to put interaction of sample and name on x axis and then adjust the labels for the x axis. Problem is that bars are not put close to each other.
ggplot(df, aes(x = as.numeric(interaction(sample,name)), y = count, fill = type)) +
geom_bar(stat = "identity",color="white") +
scale_x_continuous(breaks=c(1.5,3.5,5.5),labels=c("oak","birch","cedar"))
Another solution is to use facets for name and sample as x values.
ggplot(df,aes(x=sample,y=count,fill=type))+
geom_bar(stat = "identity",color="white")+
facet_wrap(~name,nrow=1)