ggplot reorders my factors - r

I am plotting a bar and line chart using a background theme from ggthemes. My variables are grouped by an ordered factor that I set. When I don't use the theme, the factors order the way I want them. But when I add a ggtheme, the order for the line changes, as can be seen in the legend. Why is this happening and how do I fix it?
Example code:
testCount %>%
ggplot(aes(x = tests)) +
theme_solarized_2(light = F) + scale_colour_solarized('blue') +
geom_bar(aes(y = ..prop.., fill = BandType), position = "dodge") +
stat_ecdf(aes(color = BandType), size = 1) +
scale_x_continuous(breaks = seq(0, 18, 1)) +
scale_y_continuous(breaks = seq(0, 1, 0.1), limits = c(0, 1), labels = percent)
Here is my desired output, where factors are ordered in bar and line chart:
And here is the undesired plot, where factor changes order in the line chart:
EDIT: adding theme_solarized_2(light = F) + scale_fill_solarized('blue') + scale_color_solarized('blue') made the factor ordering consistent. Thanks!

Related

labels on a stacked density plot

I'm generating a stacked density plot:
ggplot(data=tydy_rawdata, aes(x=timepoint, y=tpm, group=fct_inorder(names),
fill=fct_inorder(names))) +
geom_density(position="fill",
stat="identity") +
scale_fill_manual(values = rev(mycolors))
plot :
I would like to add label on each curve (or at least the top 3 or 4) basing on the "names" displayed on the right.
I'm trying adding geom_text but the result is this :
gplot(data=tydy_rawdata, aes(x=timepoint, y=tpm, group=fct_inorder(names),
fill=fct_inorder(names))) +
geom_density(position="fill",
stat="identity") +
geom_text(aes(label=names)) +
scale_fill_manual(values = rev(mycolors))
plot :
Are there some way to do it?
First, your chart is a stacked area chart, i.e. geom_density with stat="identity" is equal to geom_area. Second, when adding labels via geom_text you have to take account of the position argument. As you use position="fill" for your density/area chart you also have to do the same for geom_text.
As you provided no example data I created my own to make your issue reproducible:
library(ggplot2)
library(forcats)
set.seed(123)
tydy_rawdata <- data.frame(
names = rep(LETTERS[1:10], each = 6),
timepoint = factor(seq(6)),
tpm = runif(6 * 10, 0, 80)
)
ggplot(data = tydy_rawdata, aes(
x = timepoint, y = tpm,
group = fct_inorder(names), fill = fct_inorder(names)
)) +
geom_area(
position = "fill",
color = "black"
) +
geom_text(aes(label = names), position = "fill")

Use free_y scale on first axis and fixed on second + facet_grid + ggplot2

Is there any method to set scale = 'free_y' on the left hand (first) axis in ggplot2 and use a fixed axis on the right hand (second) axis?
I have a dataset where I need to use free scales for one variable and fixed for another but represent both on the same plot. To do so I'm trying to add a second, fixed, y-axis to my data. The problem is I cannot find any method to set a fixed scale for the 2nd axis and have that reflected in the facet grid.
This is the code I have so far to create the graph -
#plot weekly seizure date
p <- ggplot(dfspw_all, aes(x=WkYr, y=Seizures, group = 1)) + geom_line() +
xlab("Week Under Observation") + ggtitle("Average Seizures per Week - To Date") +
geom_line(data = dfsl_all, aes(x =WkYr, y = Sleep), color = 'green') +
scale_y_continuous(
# Features of the first axis
name = "Seizures",
# Add a second axis and specify its features
sec.axis = sec_axis(~.[0:20], name="Sleep")
)
p + facet_grid(vars(Name), scales = "free_y") +
theme(axis.ticks.x=element_blank(),axis.text.x = element_blank())
This is what it is producing (some details omitted from code for simplicity) -
What I need is for the scale on the left to remain "free" and the scale on the right to range from 0-24.
Secondary axes are implemented in ggplot2 as a decoration that is a transformation of the primary axis, so I don't know an elegant way to do this, since it would require the secondary axis formula to be aware of different scaling factors for each facet.
Here's a hacky approach where I scale each secondary series to its respective primary series, and then add some manual annotations for the secondary series. Another way might be to make the plots separately for each facet like here and use patchwork to combine them.
Given some fake data where the facets have different ranges for the primary series but the same range for the secondary series:
library(tidyverse)
fake <- tibble(facet = rep(1:3, each = 10),
x = rep(1:10, times = 3),
y_prim = (1+sin(x))*facet/2,
y_sec = (1 + sin(x*3))/2)
ggplot(fake, aes(x, y_prim)) +
geom_line() +
geom_line(aes(y= y_sec), color = "green") +
facet_wrap(~facet, ncol = 1)
...we could scale each secondary series to its primary series, and add custom annotations for that secondary series:
fake2 <- fake %>%
group_by(facet) %>%
mutate(y_sec_scaled = y_sec/max(y_sec) * (max(y_prim))) %>%
ungroup()
fake2_labels <- fake %>%
group_by(facet) %>%
summarize(max_prim = max(y_prim), baseline = 0, x_val = 10.5)
ggplot(fake2, aes(x, y_prim)) +
geom_line() +
geom_line(aes(y= y_sec_scaled), color = "green") +
facet_wrap(~facet, ncol = 1, scales = "free_y") +
geom_text(data = fake2_labels, aes(x = x_val, y = max_prim, label = "100%"),
hjust = 0, color = "green") +
geom_text(data = fake2_labels, aes(x = x_val, y = baseline, label = "0%"),
hjust = 0, color = "green") +
coord_cartesian(xlim = c(0, 10), clip = "off") +
theme(plot.margin = unit(c(1,3,1,1), "lines"))

Set the width and gap in geom_bar in a large dataset with a lot of unique values

I have the dataframe below:
res<-sample.int(2187, 2187)
freq<-floor(runif(2187, 95,105))
t<-data.frame(res,freq)
and Im trying to create a bar chart based on this but despite the fact that I use width and color arguments I still cannot create space between the bars which are black instead of the selected fill.
library(ggplot2)
require(scales)
ggplot(t,width=0.1)+
geom_bar(aes(x=res,y=freq ,fill = (t$res==101)),
color = "black",stat = "identity") +
scale_fill_manual(values=c("darkblue", "lightblue"), guide = F) +
theme_classic(base_size = 16)+ theme(legend.position = "none")+
scale_x_discrete(breaks = seq(80, 115, 5))+ scale_y_continuous(labels = comma)
Note that this code works nice for a dataset with much fewer unique values like:
fac<-factor(rep(c(80,85,100,100.5,100.7,101,101.5,110,105),2000000))
res<-data.frame(fac)
new<-data.frame(table(res))
require(scales)
ggplot(new,width=0.1)+
geom_bar(aes(x=res,y=Freq ,fill = (new$res==101)),
color = "black",stat = "identity") +
scale_fill_manual(values=c("darkblue", "lightblue"), guide = F) +
theme_classic(base_size = 16)+ theme(legend.position = "none")+
scale_x_discrete(breaks = seq(80, 115, 5))+ scale_y_continuous(labels = comma)
May be I am completely wrong but if I understand correctly, the OP wants to reproduce the second chart from scratch using a sample of random numbers instead of already tabulated counts.
To create a histogram / bar chart, we only need a vector of random numbers (wraped in a data.frame for ggplot) and let geom_bar() do the counting. In addition, a particular bar will be highlighted.
By using floor(), the random numbers are already binned but are still considered as continuous by ggplot(). Therefore, they need to be turned into factor.
# create data
set.seed(123L) # ensure random data are reproducible
t <- data.frame(res = floor(runif(2187, 95, 105)))
library(ggplot2)
ggplot(t) +
aes(x = as.factor(res), fill = res == 101) +
geom_bar() +
theme_classic(base_size = 16) +
scale_fill_manual(values = c("darkblue", "lightblue"), guide = FALSE) +
xlab("res") +
ylab("freq")
Edit: geom_histogram()
Ther is an alternative approach using geom_histogram().
geom_histogram() does all steps in one go: The binning (no need to use floor()) as well as counting and plotting:
set.seed(123L) # ensure random data are reproducible
t2 <- data.frame(res = runif(2187, 95,105)) # floor() omitted here
ggplot(t2) +
aes(x = res, fill = floor(res) == 101) +
geom_histogram(breaks = seq(95, 105, 1), closed = "left") +
theme_classic(base_size = 16) +
scale_fill_manual(values = c("darkblue", "lightblue"), guide = FALSE) +
xlab("res") +
ylab("freq")
Here, the breaks parameter was used to specify the bin boundaries explicitely. Alternatively, the number of bins or the width of the bins can be specifies. This gives flexibilty to play around with the parameters.
Edit 2
The OP has asked about the case where the random numbers are uniformly distributed between 100 and 1015. With an adjustment to the sequence of breaks,
set.seed(123L) # ensure random data are reproducible
t3 <- data.frame(res = runif(2187, 100, 1015))
ggplot(t3) +
aes(x = res, fill = floor(res) == 101) +
geom_histogram(breaks = seq(100, 1015, 1), closed = "left") +
theme_classic(base_size = 16) +
scale_fill_manual(values = c("darkblue", "lightblue"), guide = FALSE) +
xlab("res") +
ylab("freq")
returns
This chart contains over 900 bars for each bin of width 1 which aren't all visible depending on the screen resolution as already explained by Jon Spring.
Therefore, it might be more suitable to reduce the number of bins, e.g., to 100 bins:
ggplot(t3) +
aes(x = res, fill = floor(res) == 101) +
geom_histogram(bins = 100L) +
theme_classic(base_size = 16) +
scale_fill_manual(values = c("darkblue", "lightblue"), guide = FALSE) +
xlab("res") +
ylab("freq")
Please note that 101 is still highlighted in the lower left corner.
Edit -- added alternate solutions at bottom.
If you have over 2,000 bars, and each one has a black outline 1 pixel wide on each side, that'll take something on the order of 6,000 horizontal pixels (ignoring anti-aliasing) to see one with a different fill. Most screens have much lower resolution than that.
If you must use bars, and must show every value, one option would be to drop the outline with color = NA and set width = 1 (as a term in the geom_col/geom_bar call) so there's no distracting blank space between bars. Even then, the different color at res == 101 is only visible at certain resolutions. (That might vary on device settings and anti-aliasing.)
ggplot(t)+
geom_col(aes(x=res,y=freq , fill = (res==101)),
color = NA, width = 1) +
scale_fill_manual(values=c("darkblue", "lightblue"), guide = F) +
theme_classic(base_size = 16) +
scale_x_continuous(breaks = c(500*0:4, 101))
If you must show all 2000 points, but want to highlight one, it might make sense to use a different geom that spreads the data out to use more of the available space.
For instance, we might use geom_point or geom_jitter to plot all the coordinates in 2d space. Here, I highlight the element with res == 101. I use arrange to make sure the special dot gets plotted last so that it doesn't get occluded.
library(dplyr)
ggplot(t %>% arrange(res == 101),
aes(x = res, y = freq,
fill = res == 101,
size = res == 101)) +
geom_jitter(shape = 21, stroke = 0.1)
Or we might plot the data as a line, highlighting the special dot on its own:
ggplot(t, aes(res, freq)) +
geom_line(color = "gray70") +
geom_point(data = subset(t, res == 101)) +
expand_limits(y=0)

How can I plot 2 related variables on the same axis using ggplot? [duplicate]

Edit: This question has been marked as duplicated, but the responses here have been tried and did not work because the case in question is a line chart, not a bar chart. Applying those methods produces a chart with 5 lines, 1 for each year - not useful. Did anyone who voted to mark as duplicate actually try those approaches on the sample dataset supplied with this question? If so please post as an answer.
Original Question:
There's a feature in Excel pivot charts which allows multilevel categorical axes.I'm trying to find a way to do the same thing with ggplot (or any other plotting package in R).
Consider the following dataset:
set.seed(1)
df=data.frame(year=rep(2009:2013,each=4),
quarter=rep(c("Q1","Q2","Q3","Q4"),5),
sales=40:59+rnorm(20,sd=5))
If this is imported to an Excel pivot table, it is straightforward to create the following chart:
Note how the x-axis has two levels, one for quarter and one for the grouping variable, year. Are multilevel axes possible with ggplot?
NB: There is a hack with facets that produces something similar, but this is not what I'm looking for.
library(ggplot2)
ggplot(df) +
geom_line(aes(x=quarter,y=sales,group=year))+
facet_grid(.~year,scales="free")
New labels are added using annotate(geom = "text",. Turn off clipping of x axis labels with clip = "off" in coord_cartesian.
Use theme to add extra margins (plot.margin) and remove (element_blank()) x axis text (axis.title.x, axis.text.x) and vertical grid lines (panel.grid.x).
library(ggplot2)
ggplot(data = df, aes(x = interaction(year, quarter, lex.order = TRUE),
y = sales, group = 1)) +
geom_line(colour = "blue") +
annotate(geom = "text", x = seq_len(nrow(df)), y = 34, label = df$quarter, size = 4) +
annotate(geom = "text", x = 2.5 + 4 * (0:4), y = 32, label = unique(df$year), size = 6) +
coord_cartesian(ylim = c(35, 65), expand = FALSE, clip = "off") +
theme_bw() +
theme(plot.margin = unit(c(1, 1, 4, 1), "lines"),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank())
See also the nice answer by #eipi10 here: Axis labels on two lines with nested x variables (year below months)
The suggested code by Henrik does work and helped me a lot! I think the solution has a high value. But please be aware, that there is a small misstake in the first line of the code, which results in a wrong order of the data.
Instead of
... aes(x = interaction(year,quarter), ...
it should be
... aes(x = interaction(quarter,year), ...
The resulting graphic has the data in the right order.
P.S. I suggested an edit (which was rejected until now) and, due to a small lack of reputation, I am not allowed to comment, what I rather would have done.
User Tung had a great answer on this thread
library(tidyverse)
library(lubridate)
library(scales)
set.seed(123)
df <- tibble(
date = as.Date(41000:42000, origin = "1899-12-30"),
value = c(rnorm(500, 5), rnorm(501, 10))
)
# create year column for facet
df <- df %>%
mutate(year = as.factor(year(date)))
p <- ggplot(df, aes(date, value)) +
geom_line() +
geom_vline(xintercept = as.numeric(df$date[yday(df$date) == 1]), color = "grey60") +
scale_x_date(date_labels = "%b",
breaks = pretty_breaks(),
expand = c(0, 0)) +
# switch the facet strip label to the bottom
facet_grid(.~ year, space = 'free_x', scales = 'free_x', switch = 'x') +
labs(x = "") +
theme_classic(base_size = 14, base_family = 'mono') +
theme(panel.grid.minor.x = element_blank()) +
# remove facet spacing on x-direction
theme(panel.spacing.x = unit(0,"line")) +
# switch the facet strip label to outside
# remove background color
theme(strip.placement = 'outside',
strip.background.x = element_blank())
p

How to order bars in faceted ggplot2 bar chart

If I want to order the bars in a ggplot2 barchart from largest to smallest, then I'd usually update the factor levels of the bar category, like so
one_group <- data.frame(
height = runif(5),
category = gl(5, 1)
)
o <- order(one_group$height, decreasing = TRUE)
one_group$category <- factor(one_group$category, levels = one_group$category[o])
p_one_group <- ggplot(one_group, aes(category, height)) +
geom_bar(stat = "identity")
p_one_group
If have have several groups of barcharts that I'd like in different facets, with each facet having bars ordered from largest to smallest (and different x-axes) then the technique breaks down.
Given some sample data
two_groups <- data.frame(
height = runif(10),
category = gl(5, 2),
group = gl(2, 1, 10, labels = letters[1:2])
)
and the plotting code
p_two_groups <- ggplot(two_groups, aes(category, height)) +
geom_bar(stat = "identity") +
facet_grid(. ~ group, scales = "free_x")
p_two_groups
what do I need to do to get the bar ordering right?
If it helps, an equivalent problem to solve is: how do I update factor levels after I've done the faceting?
here is a hack:
two_groups <- transform(two_groups, category2 = factor(paste(group, category)))
two_groups <- transform(two_groups, category2 = reorder(category2, rank(height)))
ggplot(two_groups, aes(category2, height)) +
geom_bar(stat = "identity") +
facet_grid(. ~ group, scales = "free_x") +
scale_x_discrete(labels=two_groups$category, breaks=two_groups$category2)
make UNIQUE factor variable for all entries (category2)
reorder the variable based on the height
plot on the variable: aes(x=category2)
re-label the axis using original value (category) for the variable (category2) in scale_x_discrete.
Here is a hack to achieve what you want. I was unable to figure out how to get the category values below the tick marks. So if someone can help fix that, it would be wonderful. Let me know if this works
# add a height rank variable to the data frame
two_groups = ddply(two_groups, .(group), transform, hrank = rank(height));
# plot the graph
p_two_groups <- ggplot(two_groups, aes(-hrank, height)) +
geom_bar(stat = "identity") +
facet_grid(. ~ group, scales = "free_x") +
opts(axis.text.x = theme_blank()) +
geom_text(aes(y = 0, label = category, vjust = 1.5))

Resources