Reorder Function in R Issues [duplicate] - r

This question already has answers here:
Ordering of bars in ggplot
(2 answers)
Plot data in descending order as appears in data frame [duplicate]
(1 answer)
Order Bars in ggplot2 bar graph
(16 answers)
How to show bars in ggplot2 in descending order of a numeric vector?
(2 answers)
Closed last year.
I am trying to add reorder to get the data set to arrange from largest to smallest but having issues.
ggplot(Manager_Graph_Data, aes(x = reorder(Manager_Graph_Data$`Completion Rate`), y = Manager_Graph_Data$Manager)) +
geom_bar(stat="identity",
position="identity",
fill="#0077b5")
structure(list(Manager = c("Bob Beno", "Dylan Tracy", "Ignacia Lemley",
"Jaimee Cogdill", "Jeneva Engman", "Julianne Holdren", "Lakia Farrington",
"Lester Braden", "Soon Mooneyham"), Complete = c(5, 5, 1, 4,
0, 0, 3, 2, 5), Incomplete = c(6, 6, 7, 2, 3, 4, 5, 2, 3), Total = c(11,
11, 8, 6, 3, 4, 8, 4, 8), `Completion Rate` = c(0.454545454545455,
0.454545454545455, 0.125, 0.666666666666667, 0, 0, 0.375, 0.5,
0.625)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-9L))
Any help will be much appreciated

It looks like you're trying to order by the completion rate. When you have a bar or column chart, ggplot will order by the factor (or character) field. So to change the order, set the factor levels. There are a variety of ways to do this. Here is one way:
library(tidyverse)
# order by rate decreasing
MgrRate <- Manager_Graph_Data %>%
arrange(`Completion Rate`, decreasing = T) %>%
mutate(Manager = ordered(Manager, levels = .$Manager))
ggplot(MgrRate,
aes(x = `Completion Rate`,
y = Manager)) +
geom_bar(stat="identity",
position="identity",
fill="#0077b5")
In case you were not aware, if you want to set an x and y, try using geom_col() to simplify things.
# alternatively (creating the same plot)
ggplot(MgrRate,
aes(x = `Completion Rate`,
y = Manager)) +
geom_col(fill="#0077b5")
If you actually wanted to order by the manager, here's an example of how to do that. (This is by the first name of the manager.)
# order by manager's first name
Mgr <- Manager_Graph_Data %>%
arrange(desc(Manager)) %>%
mutate(Manager = ordered(Manager, levels = .$Manager))
ggplot(Mgr,
aes(x = `Completion Rate`,
y = Manager)) +
geom_col(fill="#0077b5")
Just so you are aware, when you flip the axes (but the factor on y, instead of x) you have to reverse the order.

Related

Stacked histogram in R. fill not stacking

Trying to make a stacked histogram, but it just comes out grey, with no stacking. I don't understand what is different from all the examples on here, or the built in 'iris' example, unless using time as x variable is a problem.
I have a big df, in long format, cut down to 25 rows and named 'mini' for this example:
> dput(mini)
structure(list(maxdep = c(203.9540564, 212.9573869, 13.45896065,
209.961431, 162.9633891, 13.97961439, 85.48389032, 102.4905817,
100.0035986, 88.02608837, 89.02947373, 22.0301996, 20.03060219,
19.03098037, 29.03141345, 13.03170014, 82.0328164, 55.03384725,
15.03437183, 17.53463412, 37.5352136, 70.03588457, 90.53687883,
91.53861116, 10.03902594), st_time = structure(c(1633321800,
1633328510, 1633331050, 1633331285, 1633334080, 1633347960, 1633348185,
1633355115, 1633279830, 1633298825, 1633301480, 1633302985, 1633303300,
1633303600, 1633303825, 1633304280, 1633304430, 1633305635, 1633306445,
1633306610, 1633306890, 1633307310, 1633307960, 1633309380, 1633310320
), class = c("POSIXct", "POSIXt"), tzone = ""), dbin = c(2, 2,
1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1)), row.names = c(NA, 25L), class = "data.frame")
the code is simple:
gg3 <- ggplot(data = mini, aes(x = st_time, fill = dbin)) #
gg3 <- gg3 + geom_histogram(position = "stack", binwidth = 3600) # gives hourly columns in histogram
gg3
this should plot the start time of the data on the x axis - correct, against the count on y - correct and stack in colour by dbin value (e.g. 1 through 5) - producing 5 colours of histogram stacked on top of each other (only two are present in the sample data above).
Instead I get one grey plot of all data (25 count total). please help me understand what is wrong
You can change dbin to a factor:
mini %>%
ggplot(aes(x = st_time, fill = as.factor(dbin) )) +
geom_histogram(position = "stack", binwidth = 3600)

How to draw a stacked barplot with three categorical variables representing the proportion of only one of them for each facet in r?

This is the data to take up as a reference
df <- data.frame(a = c(3,3,3,3,3,2,2,3,2,1,1,1,3,1,3), b = c(1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2), c = c(4, 5, 3, 2, 4, 2, 3, 4, 5, 4, 4, 3, 3, 1, 2) )
I want to draw a bargraph with the proportion of a for each facet. At the same time I want the bars to be colored according to the b value.
The variable b is not relevant for calculating the percentage. This is what I came up with, when I set the fill = c, it divides the stacked color in two, one corresponding to 1, and the other as NA.
ggplot(aes(x = a, y = ...prop..., group = 1, fill = b)) +
geom_bar(position = "stack") +
facet_wrap(~c, nrow = 1, ncol = 5) +
labs(title = "Count of a among c")
how can I have a result similar to this one but with the proportions of a for each facet wrap instead of the absolute values?
Thank you!
Here's an approach using the ..count.. and ..PANEL.. special symbols:
ggplot(df, aes(x = a, fill = as.factor(b))) +
geom_bar(aes(y = ..count.. / tapply(..count..,..PANEL..,sum)[..PANEL..])) +
facet_wrap(~c, nrow = 1, ncol = 5) +
labs(title = "Count of a among c", fill = "b", y = "Proportion")
If you weren't using facet_wrap this would be trivial by setting y = ..prop... However, ..prop.. is not caculated properly by facet. So, to get around this problem, we can use tapply and the ..PANEL.. special symbol to sum ..count.. only for that panel. The last [..PANEL..] is to subset the resulting vector.
The other issue you had was that b is class numeric, so you need to convert that to factor.

How to make a barchart where the x-axis includes gaps

I'd like the x-axis of my barchart to be a continuous scale.
Here is my data:
list(
Century = c(1, 2, 3, 4, 5),
CenturyLabel = c("1st", "Bit later", "", "", "Post-Roman"),
Value = c(.2, .3, 0, 0, .4) ) %>% as_tibble()
I'm hoping to see bars for the 1st, 2nd, and 5th centuries with gaps for the 3rd and 4th.
The trick is to define your x-axis variable as a factor.
library("dplyr")
df <- tibble(
Century = c(1, 2, 3, 4, 5),
CenturyLabel = c("1st", "Bit later", "", "", "Post-Roman"),
Value = c(.2, .3, 0, 0, .4) )
df$CenturyFactor <- factor(df$Century, labels = df$CenturyLabel), ordered = TRUE)
You can then use CenturyFactor as x-axis variable and you'll see a gap with any correct plotting libraries... With the big caveat that any duplicate labels cause the centuries to be merged!
One way around this is to plot Century (1 to 5) but tweak the labels to show CenturyLabel. This will be library-specific. No factors needed.
Using ggplot2:
library("ggplot2")
ggplot(df, aes(x = Century, y = Value)) +
geom_col() +
scale_x_continuous(labels = df$CenturyLabel, breaks = df$Century)

displaying different symbols for each point within the same factor in R ggplot2

I am trying to create a plot to show the mean of calculated values within each group (organised by factors), as well as the induvidual points themselves. I have managed to do this successfully, however all the points use the same symbol. I want to have a different symbol for each of the points within each factor, and preferably use the same points in the same order for each factor.
An example version of the kind of graph I am currently making is below, however all the points within the same column use the same symbol.
I have thought about using the row number of the points to define the symbol shape, but I think there are only 25 different shapes available in the default ggplot2 package, and my real data has more than 25 points, plus I would prefer if the same points were used in each column, to keep the graph looking consistent.
Mean_list <- data.frame(Cells = factor(c("Celltype1", "Celltype2", "Celltype3",
"Celltype4"),
levels =c("Celltype1", "Celltype2", "Celltype3", "Celltype4")),
Mean = c(mean(c(1, 2, 3)), mean(c(5, 8, 4)), mean(c(9, 8 ,3)),
mean(c(3, 6, 8, 5))))
values_list <- data.frame(Cells2 = rep(c("Celltype1", "Celltype2", "Celltype3",
"Celltype4"), times = c(length(c(1, 2, 3)),
length(c(5, 8, 4)), length(c(9, 8 ,3)),
length(c(3, 6, 8, 5)))),
values = c(1, 2, 3, 5, 8, 4, 9, 8, 3, 3, 6, 8, 5))
ggplot() + geom_col(data = Mean_list, aes(Cells, Mean, fill = Cells)) +
geom_point(data = values_list, aes(Cells2, values))
Before plotting we may assign a number to each row in within a cell:
values_list <- values_list %>% group_by(Cells2) %>% mutate(shape = factor(seq_along(values)))
ggplot() +
geom_col(data = Mean_list, aes(Cells, Mean, fill = Cells)) +
geom_point(data = values_list, aes(Cells2, values, shape = shape))

Plot overlapping vertical lines with ggplot

I have a list of time-ordered pairwise interactions. I want to plot a temporal network of these interactions, which would look something like the diagram below.
My data looks like the example below. The id1 and id2 values are the unique identifiers of individuals. The time indicates when an interaction betweens those individuals occurred. So at time = 1, I want to plot a connection between individual-1 and individual-2.
id1 <- c(1, 2, 1, 6, 2, 2, 1)
id2 <- c(2, 4, 5, 7, 3, 4, 5)
time <- c(1, 2, 2, 2, 3, 4, 5)
df <- data.frame(id1, id2, time)
According to this StackOverflow question, I can see that it is possible to draw vertical lines between positions on the y-axis in ggplot. This is achieved by reshaping the data into a long format. This is fine when there is only one pair per time value, but not when there is more than one interacting pair at a time. For example in my dummy data, at time = 2, there are three pairs (in the plot I would show these by overlaying lines with reduced opacity).
My question is, how can I organise these data in a way that ggplot will be able to plot potentially multiple interacting pairs at specified time points?
I have been trying to reorganise the data by assigning an extra identifier to each of the multiple pairs that occur at the same time. I imagined the data table to look like this, but I haven't figure out how to make this in R... In this example the three interactions at time = 2 are identified by an extra grouping of either 1, 2 or 3. Even if I could arrange this I'm still not sure how I would get ggplot to read it.
Ultimately I'm trying to create someting that looks like Fig. 2 in this scientific paper.
Any help would be appreciated!
You can do this without reshaping the data, just set one id to y and the other id to yend in geom_curve:
ggplot(df, aes(x = time, y = id1)) +
geom_curve(aes(xend = time, yend = id2), curvature = 0.3) +
geom_hline(yintercept = 1:7, colour = scales::muted("blue")) +
geom_point(size = 3) +
geom_point(aes(y = id2), size = 3) +
coord_cartesian(xlim = c(0, max(df$time) + 1)) +
theme_bw()
Output:
Libraries:
library('ggplot2')
library('data.table')
Data:
id1 <- c(1, 2, 1, 6, 2, 2, 1)
id2 <- c(2, 4, 5, 7, 3, 4, 5)
time <- c(1, 2, 2, 2, 3, 4, 5)
df <- data.frame(id1, id2, time)
setDT(df)
df1 <- melt.data.table( df, id.vars = c('time'))
Plot:
p <- ggplot( df1, aes(time, value)) +
geom_point() +
geom_curve( mapping = aes(x = time, y = id1, xend = time, yend = id2, colour = "curve"),
data = df,
curvature = 0.2 )
print(p)

Resources