arrange by two factors

arrange by two factors - r

I have been trying the whole day to arrange two factor levels called "type" and "name" by a numeric value called "score", and plot by category type (with color determined by type) ordered by score. I am also trying to get the group called "ALL" on top so it is separated by the other 3 categories in "type". My attempts until now have been very unsuccessful, I don't get why I can't even get the reordering correctly. Any help is very appreciated.
This is my data:
df = structure(list(score = c(12, 12.2, 12.5, 12.3, 12.2, 12.4, 12.5, 12.7, 12.1, 12.8, 12.4, 12.3, 12.2, 12.6, 12.8, 12.1, 12.5), range1 = c(0.003356, 1.20497, -0.128138, -42.6093, -41.1975, -44.706, -20, -46.4245, -0.543379, 2.09828, -20, -20, -44.2262, -46.6559, -20, -20, 2.37709), point = c(1.56805, 2.11176, 0.1502, -22.6093, -21.1975, -24.706, -0.491829, -26.4245, 2.49973, 2.94457, 0.0443572, 0.0208999, -24.2262, -26.6559, 2.69408, 3.22951, 3.33255), range2 = c(2.3767, 2.73239, 0.430373, 4.34247, 4.96875, 3.78027, 1.91331, 4.07937, 3.54538, 3.5491, 1.87162, 2.41067, 5.26578, 4.50965, 4.55967, 5.05772, 3.97742), type = structure(c(1L, 1L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("ALL", "A", "B", "C"), class = "factor"), name = structure(c(13L, 14L, 15L, 1L, 4L, 5L, 6L, 8L, 12L, 17L, 2L, 3L, 7L, 9L, 10L, 11L, 16L), .Label = c("B_vision1", "C_vision2", "C_vision3", "B_vision4", "B_vision5", "A_vision2", "C_vision4", "B_vision6", "C_vision6", "C_vision5", "C_vision1", "B_vision7", "B_ALL", "C_ALL", "A", "C_vision7", "B_vision3"), class = "factor")), .Names = c("score", "range1", "point", "range2", "type", "name"), row.names = c(NA, -17L), class = "data.frame")
I have tried all these options:
df$name2 = reorder(df$name, -df$score)
# df$name <- reorder(df$name, -df$score)
df <- transform(df, category2 = factor(paste(name, type)))
df <- transform(df, category2 = reorder(category2, score))
#library(plyr)
#df = arrange(df,type, name)
ggplot(df, aes(x=name, y=point, ymin=range1, ymax=range2, colour=type)) +
geom_pointrange() +
coord_flip()
or
ggplot(df, aes(x=category2, y=point, ymin=range1, ymax=range2, colour=type)) +
geom_pointrange() +
coord_flip()
I am trying to get something similar to the grouped forest plot on this question but with each group defined by names and reordered by score.

I think I've interpreted what you're trying to do correctly, but I might have got it wrong.
The names (and scores) can be ordered by the sorted list of scores as
ordered.names <- as.character(df$name)[order(df$score)]
ordered.scores <- as.character(df$score)[order(df$score)]
Re-ordering the name levels (with the score annotated) is then
df$name <- factor(df$name, levels=ordered.names, labels=paste(ordered.names, "(", ordered.scores, ")"))
Plotting these with ggplot:
library(ggplot)
ggplot(df, aes(x=name, y=point, ymin=range1, ymax=range2, group=type, color=type)) +
geom_pointrange() +
theme(axis.text.x=element_text(angle=90, hjust=0))
produces
If you want this split up by type as well, you can facet the plot
ggplot(df, aes(x=name, y=point, ymin=range1, ymax=range2, group=type, color=type)) +
geom_pointrange() +
theme(axis.text.x=element_text(angle=90, hjust=0)) +
facet_wrap(~type, ncol=4, scale="free_x")

Related

How to add geom_line to stacked barplot in r

Below is my code. I am tried to add one line (data from a different csv file) on top of a stacked barplot however it wont work, the error says "object variable not found". Without added the geom_line the stacked barplot works so I assume it is the line that is creating the issue. Any ideas on how I fix this?
a <- read.csv("data.csv", header=TRUE, sep=",")
line1 <- read.csv("data1.csv", header=TRUE, sep=",")
line2 <- data.frame(line1)
library(reshape2)
c <- melt(a, id.var="day")
library(ggplot2)
a <- ggplot(c, aes(x=day, y=value, fill=variable)) +
geom_bar(stat="identity", aes(x=day, y=value), width=0.7) +
geom_line(data=line2, aes(x=day, y=value), color="black", stat="identity")
+
scale_fill_manual(values = c("black", "grey47", "grey")) +
scale_x_continuous(breaks = round(seq(min(m$day), max(m$day), by = 1),0))
print(a)

The following is a complete code example to produce the graph below.
I have changed your variables' names, in order to make them more consistent. You had named both the data.frame in file "data.csv" and the result of your ggplot instruction a.
library(reshape2)
library(ggplot2)
a <- read.csv("~/data.csv")
line1 <- read.csv("~/data2.csv")
long <- melt(a, id.var = "day")
g <- ggplot(long, aes(x = day, y = value)) +
geom_bar(aes(x = day, y = value, fill = variable),
stat = "identity", width = 0.7) +
geom_line(data = line1,
aes(x = day, y = value),
color = "black") +
scale_fill_manual(values = c("black", "grey47", "grey")) +
scale_x_continuous(breaks = min(long$day):max(long$day))
print(g)
Data in dput format.
a <-
structure(list(day = 1:31, emigration = c(6L, 6L, 6L, 6L, 5L,
3L, 1L, 9L, 8L, 7L, 6L, 4L, 3L, 1L, 2L, 4L, 5L, 6L, 8L, 7L, 5L,
4L, 1L, 2L, 4L, 9L, 8L, 7L, 6L, 4L, 3L), security = c(5L, 5L,
5L, 5L, 6L, 6L, 8L, 9L, 9L, 9L, 8L, 8L, 5L, 7L, 7L, 6L, 5L, 5L,
4L, 3L, 2L, 2L, 2L, 2L, 4L, 9L, 7L, 6L, 4L, 3L, 2L), checkin = c(4,
6, 9, 1, 3, 5, 7, 9, 8, 6, 4, 2, 1, 3, 4, 5, 6, 7, 8, 8, 2, 1,
2, 3, 4, 5, 7, 8, 9, 1, 1)), class = "data.frame",
row.names = c(NA, -31L))
line1 <-
structure(list(day = 1:31, value = c(12, 11, 10, 8, 7, 6, 6,
6, 7, 8, 14, 6, 6, 6, 8, 8, 10, 10, 12, 12, 12, 13, 13, 14, 15,
15, 10, 10, 10, 10, 12)), class = "data.frame",
row.names = c(NA, -31L))

Based on your comments of your data structure, I suppose it might help joining your dataframes first and then building the plot using one dataset. You can try:
library(dplyr)
c <- c %>%
left_join(line2 %>%
rename(value_line2 = value),
by="day")
Then adjust geom_line():
geom_line(data=c, aes(x=day, y=value_line2), color="black", stat="identity")
This might help. Please tell me if joining the data doesn't work as intended.

In case it wasn't clear, this is what I meant in my comment above:
library(ggplot2)
a <- ggplot(c, aes(x=day, y=value)) +
geom_bar(stat="identity", aes(x=day, y=value, fill=variable), width=0.7) +
geom_line(data=line2, aes(x=day, y=value), color="black", stat="identity")

What is the "N = 1" box in my R geom_bar legend, and how do I remove?

These are the data:
structure(list(Group.1 = c((name list)
), Group.2 = structure(c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 3L, 3L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 5L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Radio", "Video", "Engineering",
"800Mhz", "PSSRP", "Other"), class = "factor"), x = c(93.5, 208.75,
214, 48, 66.33, 71.5, 19.5, 64.75, 17, 39, 30.75, 96.75, 30,
19, 32.5, 12.75, 47.25, 14, 22.25, 12, 3, 128.5, 9.5, 303.2,
290.35, 364.05, 333.25, 11.75, 553.25, 423, 6, 496)), .Names = c("Group.1",
"Group.2", "x"), row.names = c(NA, -32L), class = "data.frame")
running this plot:
ggplot(data = HrSums, aes(x = Group.1, y = x, fill = Group.2)) +
geom_bar(stat = "sum", position = position_stack(reverse = TRUE)) +
coord_flip() +
labs(title = "Hours Billed, by Technician and Shop", y = "Hours Billed",
x = "Technician", fill = "Shop")
I get this bar chart:
What is the "n" box, and how do I remove it (only) from the legend?

I believe the n box is because geom_bar expects to count the number of times each combination of Group.1 and Group.2 occurs, but instead you're giving a y value in your aes. geom_bar can use a different stat instead of counting, but if you want the sums of values, it expects a weight aesthetic. Here are two ways to do this, one using weight = x in geom_bar, and one that uses dplyr functions to calculate sums beforehand, then supplies this to y.
library(tidyverse)
ggplot(df, aes(x = Group.1, fill = Group.2)) +
geom_bar(aes(weight = x), position = position_stack(reverse = T)) +
coord_flip()
df_sums <- df %>%
group_by(Group.1, Group.2) %>%
summarise(x = sum(x))
ggplot(df_sums, aes(x = Group.1, y = x, fill = Group.2)) +
geom_col(position = position_stack(reverse = T)) +
coord_flip()

if you include the following then you'll only see the aesthetics you're expecting:
show.legend = c(
"x" = TRUE,
"y" = TRUE,
"alpha" = FALSE,
"color" = FALSE,
"fill" = TRUE,
"linetype" = FALSE,
"size" = FALSE,
"weight" = FALSE
)
See show.legend argument on ?geom_bar:
show.legend logical. Should this layer be included in the legends?
NA, the default, includes if any aesthetics are mapped. FALSE never
includes, and TRUE always includes. It can also be a named logical
vector to finely select the aesthetics to display.

Order of stacked bars ggplot2 - Soil profile

The documentation for bar charts in ggplot2 says (see example 3):
Bar charts are automatically stacked when multiple bars are placed at the same location. The order of the fill is designed to match the legend.
For some reason the second sentence doesn't work for me. Here is an example data set, which represents soil layers above (leaf litter etc.) and below ground (actual soil):
df <- structure(list(horizon = structure(c(5L, 3L, 4L, 2L, 1L, 5L,
3L, 4L, 2L, 1L, 5L, 3L, 4L, 2L, 1L, 5L, 3L, 4L, 2L, 1L, 5L, 3L,
4L, 2L, 1L, 5L, 3L, 4L, 2L, 1L), .Label = c("A", "B", "F", "H",
"L"), class = "factor"), site = structure(c(1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L,
5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L), .Label = c("A", "B", "C",
"D", "E", "F"), class = "factor"), value = c(2.75, 0.5, 0.25,
-4.125, -3.375, 3.78125, 1.375, 0.625, -10.6875, -6.34375, 4.28,
2.065, 0.68, -12.1, -10.75, 8.583333333, 4.541666667, 2.166666667,
-10.70833333, -4.25, 7.35, 4, 1.8, -13.95, -5.175, 1.933333333,
1.245833333, 0.641666667, -11.16666667, -2.291666667)), .Names = c("horizon",
"site", "value"), class = "data.frame", row.names = c(NA, -30L
))
Now I try to plot the data by first specifying the order of the soil layer levels (i.e. horizons, from above to below ground):
require(ggplot2); require(dplyr)
df %>%
mutate(horizon = factor(horizon, levels = c("L","F","H","A","B"))) %>%
ggplot(aes(site, value)) + geom_col(aes(fill = horizon)) + labs(y = "Soil depth (cm)")
It works for L, F, H but not for A, B (below ground, i.e. negative values). The reason why it probably doesn't work is that the stacked bars are sorted from largest to smallest by site (for both positive and negative values separately) and then stacked in a top to bottom approach. Is this correct? If that's the case, then for my positive values it was just coincidence that the legend matched the stacked bars I believe.
What I would like to achieve is a stacking of the bars that matches the order (top to bottom) in the legend and hence also the soil profile when looking at it in a cross-sectional view and I am not sure how to approach this.
I did try to change the sorting behaviour in general but it produced the same plot as above:
df %>%
mutate(horizon = factor(horizon, levels = c("L","F","H","A","B"))) %>%
arrange(desc(value)) %>%
ggplot(aes(site, value)) + geom_col(aes(fill=horizon)) + labs(y = "Soil depth (cm)")
df %>%
mutate(horizon = factor(horizon, levels = c("L","F","H","A","B"))) %>%
arrange(value) %>%
ggplot(aes(site, value)) + geom_col(aes(fill=horizon)) + labs(y = "Soil depth (cm)")
I probably have to sort positive and negative values separately, that is descending and ascending, respectively?

Sorting in a stacked bar plot is done according to levels of the corresponding factor. The potential problem arises with negative values which are stacked in reverse (from the negative top towards 0). To illustrate to problem lets make all the values negative:
df %>%
mutate(horizon = factor(horizon, levels = c("L","F","H","B","A"))) %>%
ggplot(aes(site, value - 20)) + geom_col(aes(fill = horizon)) + labs(y = "Soil depth (cm)")
A workaround is to specify a different order of levels which will result in the wanted fill order (in this case: levels = c("L","F","H","B","A")) and manually adjust the legend using scale_fill_discrete:
df %>%
mutate(horizon = factor(horizon, levels = c("L","F","H","B","A"))) %>%
ggplot(aes(site, value)) + geom_col(aes(fill = horizon)) + labs(y = "Soil depth (cm)")+
scale_fill_discrete(breaks = c("L","F","H","A","B"))

Reordering factor for plotting using forcats and ggplot2 packages from tidyverse

First of all, thanks^13 to tidyverse. I want the bars in the chart below to follow the same factor levels reordered by forcats::fct_reorder (). Surprisingly, I see different order of levels in the data set when View ()ed as when they are displayed in the chart (see below). The chart should illustrate the number of failed students before and after the bonus marks (I want to sort the bars based on the number of failed students before the bonus).
MWE
ggplot (df) +
geom_bar (aes (forcats::fct_reorder (subject, FailNo, .desc= TRUE), FailNo, fill = forcats::fct_rev (Bonus)), position = 'dodge', stat = 'identity') +
theme (axis.text.x=element_text(angle=45, vjust=1.5, hjust=1.5, size = rel (1.2)))
Data output of dput (df)
structure(list(subject = structure(c(1L, 2L, 5L, 6L, 3L, 7L,
4L, 9L, 10L, 8L, 12L, 11L, 1L, 2L, 5L, 6L, 3L, 7L, 4L, 9L, 10L,
8L, 12L, 11L), .Label = c("CAB_1", "DEM_1", "SSR_2", "RRG_1",
"TTP_1", "TTP_2", "IMM_1", "RRG_2", "DEM_2", "VRR_2", "PRS_2",
"COM_2", "MEB_2", "PHH_1", "PHH_2"), class = "factor"), Bonus = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("After", "Before"), class = "factor"),
FailNo = c(29, 28, 20, 18, 15, 13, 12, 8, 5, 4, 4, 2, 21,
16, 16, 14, 7, 10, 10, 5, 3, 4, 4, 1)), .Names = c("subject",
"Bonus", "FailNo"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-24L))
Bar chart
The issue
According to the table above, SSR_2 var should come in the fifth rank and IMM_1 in the sixth, however in the chart we see these two variables swapping their positions. How to sort it right after tidyverse in this case?

Use factor with unique levels for your x -axis.
ggplot (df) +
geom_bar (aes(factor(forcats::fct_reorder
(subject, FailNo, .desc= TRUE),
levels=unique(subject)),
FailNo,
fill = forcats::fct_rev (Bonus)),
position = 'dodge', stat = 'identity') +
theme(axis.text.x=element_text(angle=45, vjust=1.5, hjust=1.5, size = rel (1.2)))
Edited: #dotorate comment

Sort failNo before the bonus
library(dplyr)
df_before_bonus <- df %>% filter(Bonus == "Before") %>% arrange(desc(FailNo))
Use FailNo before the bonus to create the factor
df$subject <- factor(df$subject, levels = df_before_bonus$subject, ordered = TRUE)
Updated plot
ggplot(df) +
geom_bar(aes (x = subject, y = FailNo, fill = as.factor(Bonus)),
position = 'dodge', stat = 'identity') +
theme (axis.text.x=element_text(angle=45, vjust=1.5, hjust=1.5, size = rel (1.2)))

With both stacked and dodged bars, how can you remove dodge-bar elements from legend?

Thanks to combine stacked bars and dodged bars, I created the plot below using the data frame shown. But now, since the axis titles name the bars, how can I remove the legend elements other than for the one stacked bar? That is, can the legend show only the segments of the Big8 bar?
> dput(combo)
structure(list(firm = structure(c(12L, 1L, 11L, 13L, 2L, 3L,
4L, 5L, 6L, 7L, 8L, 9L, 10L), .Label = c("Avg.", "Co", "Firm1",
"Firm2", "Firm3", "Firm4", "Firm5", "Firm6", "Firm7", "Firm8",
"Median", "Q1", "Q3"), class = "factor"), metric = structure(c(5L,
1L, 4L, 6L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Avg.",
"Big8", "Co", "Median", "Q1", "Q3"), class = "factor"), value = c(0.0012,
0.0065, 0.002, 0.0036, 0.0065, 0.000847004466666667, 0.000658907411111111,
0.0002466389, 8.41422555555556e-05, 8.19149222222222e-05, 7.97185555555556e-05,
7.82742555555556e-05, 7.56679888888889e-05), grp = structure(c(1L,
2L, 3L, 6L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("Q1",
"Avg.", "Median", "Co", "Big8", "Q3"), class = "factor")), .Names = c("firm",
"metric", "value", "grp"), row.names = c(NA, -13L), class = "data.frame")
Here is the plotting code.
ggplot(combo, aes(x=grp, y=value, fill=firm)) +
geom_bar(stat="identity") +
labs(x = "", y = "") +
theme(legend.position = "bottom") +
guides(fill = guide_legend(nrow = 2))
The plot, which ideally would have a smaller set of elements in the legend.

You can manually set the breaks for scale_fill_discrete:
library(ggplot2)
ggplot(combo, aes(x=grp, y=value, fill=firm)) +
geom_bar(stat="identity") +
labs(x = "", y = "") +
theme(legend.position = "bottom") +
guides(fill = guide_legend(nrow = 2)) +
scale_fill_discrete(breaks = combo$firm[combo$metric=="Big8"])
I'm not 100% sure which labels you want to keep, but a manually entered vector, combo$firm and combo$metric will all work.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

arrange by two factors - r

Related

How to add geom_line to stacked barplot in r

What is the "N = 1" box in my R geom_bar legend, and how do I remove?

Order of stacked bars ggplot2 - Soil profile

Reordering factor for plotting using forcats and ggplot2 packages from tidyverse

With both stacked and dodged bars, how can you remove dodge-bar elements from legend?

Categories

Resources