How to order a plot from lowest to highest? - r

I looked at other threads where this question is answered,but i couldn't adapt the code. I plot the following graph below. I try then to order from lowest to highest according to the blue color (education==3) when time is at 0. I use the following code to create the order.
country_order <- df %>%
filter(education == 3 & time==0) %>%
arrange(unemployment) %>%
ungroup() %>%
mutate(order = row_number())
However, i am not sure how to introduce the new variable order into ggplot to get the ordering i want. Could someone help?
Here is the plot
ggplot(df, aes(y=unemployment, x=time, fill= education)) +
geom_col(, color = "black") +
facet_wrap(~ country)
Here is the data:
df= structure(list(time = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"),
unemployment = structure(c(25, 35, 40, 10, 20, 70, 20, 25,
55, 23, 17, 60), format.stata = "%9.0g"), education = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1",
"2", "3"), class = "factor"), country = structure(c(1, 1,
1, 1, 1, 1, 2, 2, 2, 2, 2, 2), format.stata = "%9.0g")), row.names = c(NA,
-12L), class = "data.frame")

I think you can use fct_reorder() to reorder factor levels of the desired variable by sorting along another variable.
df %>%
ggplot(aes(y=unemployment, x=time, fill= fct_reorder(education, unemployment, .desc = T))) +
geom_col(, color = "black") +
facet_wrap(~ country)

Related

gganimate transition_reveal() with geom_line() breaking on the final frame?

I am trying to animate a line graph with multiple lines. It seems that there is an error with the gganimate package involving transition_reveal() that is causing the final frame to revert for all of the lines but one. This error is not present when not using gganimate. Here is the code:
df <- read.csv("test.csv", stringsAsFactors = TRUE)
anim <- ggplot(df, aes(Day, Accidents, group = State, color = State)) +
geom_line() +
transition_reveal(Day) +
ease_aes('cubic-in-out')
jiff <- animate(anim, fps = 24, duration = 5, start_pause = 0, end_pause = 72, height = 4, width = 7, units = "in", res = 150)
jiff
Here is the dput of the dataframe:
structure(list(State = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L), levels = c("A", "B", "C", "D"), class = "factor"),
Day = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L),
Accidents = c(5L, 2L, 5L, 6L, 1L, 2L, 6L, 8L, 4L, 10L, 2L,
4L)), class = "data.frame", row.names = c(NA, -12L))
Here is the output:
Regardless of the ending pause or how many values I have along the x-axis, the final frame will always look like this with only one line appearing as updated. Does anyone know why this might be happening?
UPDATE: Reverting the gganimate package from 1.0.8 to 1.0.7 did seem to do the trick after all.
The issue is in this line start_pause = 0, end_pause = 72,. Remove or adapt it:
anim <- ggplot(df, aes(Day, Accidents, group= State, color = State)) +
geom_line() +
transition_reveal(Day) +
ease_aes('cubic-in-out')
animate(anim, fps = 24, duration = 5,
height = 4, width = 7, units = "in", res = 150)

How to use mutate function while excluding certain rows

So I am trying to create percentages to display on a plot.
Here is my dataset:
how_often_ByYear <- structure(list(Var1 = structure(c(1L, 2L, 3L, 1L, 2L, 3L), levels = c("A few times a year",
"Never been", "Once or twice"), class = "factor"), Var2 = structure(c(1L,
1L, 1L, 2L, 2L, 2L), levels = c("Year 1", "Year 2"), class = "factor"),
Freq = c(0, 122, 47, 1, 117, 50), percent = c(0, 72, 28,
1, 69, 30)), class = "data.frame", row.names = c(NA, -6L))
And here is my code:
how_often_ByYear <- Visitor_Data_ByYear %>%
dplyr::select(How_often_have_you_visited_us, Year) #selects column for question 16
#mutate_all(funs(gsub("[[:punct:]]", "", .))) #removes annoying symbols
how_often_ByYear <- table(how_often_ByYear$How_often_have_you_visited_us, how_often_ByYear$Year)
how_often_ByYear <- as.data.frame(how_often_ByYear)
how_often_ByYear <- how_often_ByYear %>%
mutate(percent = Freq/sum(Freq)*100) %>%
mutate_if(is.numeric, round, 0)
View(how_often_ByYear)
right now, the numbers include both year 1 and year 2, so my percentages add up to around 50 percent. How do I separate the percentages for each year so that I can report on both?
Thanks in advance for your help.

Ggplot stacked bar plot with percentage labels

I am trying to do a stacked bar plot based on count, but with the labels showing the percentage on the plot. I have produced the plot below. However the percentage is based on all of the data. What I am after is the percentage by team (such that the sum of the percentages for Australia = 100% and the percentages for England = 100%).
The code for achieving this is the following function. This function counts the number of different roles in each team across 5 matches (I have had to divide the result by 10 as a players role appears twice for each match (5 matches x 2 appearances):
team_roles_Q51 <- function(){
ashes_df <- tidy_data()
graph <- ggplot(ashes_df %>%
count(team, role) %>% #Groups by team and role
mutate(pct=n/sum(n)), #Calculates % for each role
aes(team, n, fill=role)) +
geom_bar(stat="identity") +
scale_y_continuous(labels=function(x)x/10) + #Needs to be a better way than dividing by 10
ylab("Number of Participants") +
geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%")),
position=position_stack(vjust=0.5)) +
ggtitle("England & Australia Team Make Up") +
theme_bw()
print(graph)
}
An example of the dataframe that is imported is:
Structure for the first 10 rows of the dataframe as follows:
structure(list(batter = c("Ali", "Anderson", "Bairstow", "Ball",
"Bancroft", "Bird", "Broad", "Cook", "Crane", "Cummins"), team = structure(c(2L,
2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L), .Label = c("Australia",
"England"), class = "factor"), role = structure(c(1L, 3L, 4L,
3L, 2L, 3L, 3L, 2L, 3L, 3L), .Label = c("allrounder", "batsman",
"bowler", "wicketkeeper"), class = "factor"), innings = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("test_1_innings_1",
"test_1_innings_2", "test_2_innings_1", "test_2_innings_2", "test_3_innings_1",
"test_3_innings_2", "test_4_innings_1", "test_4_innings_2", "test_5_innings_1",
"test_5_innings_2"), class = "factor"), batting_num = c(6, 11,
7, 10, 1, NA, 9, 1, NA, 9), score = c(38, 5, 9, 14, 5, NA, 20,
2, NA, 42), balls_faced = c(102, 9, 24, 11, 19, NA, 32, 10, NA,
120)), row.names = c(NA, 10L), class = "data.frame")
Any help would be appreciated. Thanks
You need to group_by team to calculate the proportion and use pct in aes :
library(dplyr)
library(ggplot2)
ashes_df %>%
count(team, role) %>%
group_by(team) %>%
mutate(pct= prop.table(n) * 100) %>%
ggplot() + aes(team, pct, fill=role) +
geom_bar(stat="identity") +
ylab("Number of Participants") +
geom_text(aes(label=paste0(sprintf("%1.1f", pct),"%")),
position=position_stack(vjust=0.5)) +
ggtitle("England & Australia Team Make Up") +
theme_bw()

geom_text labels don't align with geom_bar / position_dodge2 if width is changed

I am creating a barplot in ggplot2 3 which includes facet_grid and position_dodge2(preserve="single") (= same bar width in all facets) as well as geom_text for labeling. It works all fine except when I change the width of the bars with width, e.g to 1.2 (otherwise the bars are rather slim).
Two problems occur:
the labels of geom_text don't align any longer with the bars;
the bars aren't centered on the x axis as they should.
Any solution to this? A workaround with hjust doesn't seem to work since labels are not evenly misaligned when changing width. Or am I getting something wrong regarding the purpose of width ?
This seems related to my question.
Data:
x <- structure(list(SessionLastStage = structure(1:20, .Label = c("1998-1999",
"1999-2000", "2000-2001", "2001-2002", "2002-2003", "2003-2004",
"2004-2005", "2005-2006", "2006-2007", "2007-2008", "2008-2009",
"2009-2010", "2010-2011", "2011-2012", "2012-2013", "2013-2014",
"2014-2015", "2015-2016", "2016-2017", "2017-2018"), class = "factor"),
freq = c(0, 2, 18, 8, 6, 0, 0, 0, 2, 14, 8, 16, 30, 4, 12,
10, 11, 30, 1, 0), Phase = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L
), .Label = c("Introduction", "Maintenance", "Deconsolidation"
), class = "factor")), class = "data.frame", row.names = c(NA,
-20L), .Names = c("SessionLastStage", "freq", "Phase"))
plot command:
x %>%
ggplot()+
geom_bar(aes(x=SessionLastStage, y=freq),
stat="identity",
width=1.2,
position = position_dodge2(preserve="single"))+
geom_text(data=x %>% filter(freq>0),
aes(x=SessionLastStage, y=freq+1, label=freq))+
facet_grid(.~Phase,
scales="free_x",
space = "free_x")+
theme_minimal()+
theme(axis.text=element_text(angle=90))
Output:

Remove box and points in legend

How do I remove the the box, ribbon color, and points in the legend? I would just like a straight line representing each color of the color. I've tried using guides(), but it's not changing.
Sample data:
pdat1 <- structure(list(type = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("10-year",
"20-year", "30-year"), class = "factor"), effect = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), value = c(0,
-21.89, -27.36, -33.75, -40.57, -47.32, 0, -23, -28.31, -34.96,
-42.6, -50.81, 0, -16.9, -22.25, -28.87, -36.4, -44.52, 0, -10.24,
-16.8, -24.74, -33.52, -42.55, 0, -10.24, -16.8, -24.74, -33.52,
-42.55, 0, -10.24, -16.8, -24.74, -33.52, -42.55), temp = c(0,
1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3,
4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5), value_max = c(2.91,
-19.02, -24.42, -30.88, -37.63, -44.35, 2.9, -20.09, -25.36,
-32.05, -39.67, -47.87, 2.97, -14.02, -19.27, -25.89, -33.49,
-41.58, 2.42, -7.74, -14.34, -22.27, -31.06, -40.02, 2.45, -7.8,
-14.36, -22.26, -31.07, -40.07, 2.46, -7.71, -14.23, -22.23,
-31.02, -40.05), value_min = c(-2.91, -24.76, -30.3, -36.63,
-43.5, -50.3, -2.9, -25.91, -31.27, -37.87, -45.52, -53.75, -2.97,
-19.77, -25.24, -31.85, -39.32, -47.46, -2.42, -12.74, -19.26,
-27.21, -35.98, -45.08, -2.45, -12.68, -19.24, -27.22, -35.96,
-45.02, -2.46, -12.77, -19.37, -27.25, -36.02, -45.05)), class = "data.frame", row.names = c(NA,
-36L), .Names = c("type", "effect", "value", "temp", "value_max",
"value_min"))
Plot Code
library(ggplot2)
ggplot(pdat1) +
geom_ribbon(aes(ymax = value_max, ymin = value_min, x = temp, linetype = NA, color = effect, fill = effect), fill = "#C0CCD9", alpha = 0.5 ) +
geom_line(aes(x = temp, y = value, color = effect, group = effect)) +
geom_point(aes(x = temp, y = value, color = effect), size = 0.5) +
ylab("Y") +
xlab("X") +
guides(color = guide_legend(keywidth = 2,
keyheight = 1,
override.aes = list(linetype = c(1, 1),
size = 1,
shape = c(0, 0)))) +
facet_wrap(~type)
Your ggplot code is a little bit messy, particularly for the ribbon. For example the fill aestetic is both mapped to the effect variable and set to a color value (#C0CCD9).
To remove the boxes in the legend key you need to use legend.key in theme but it works only after cleaning your ggplot code.
To avoid unnecessary repetitions I have moved severeal aestetics to the first ggplot call so that ggplot use them as default for the subsequent geom_XX calls.
ggplot(pdat1, aes(x = temp, y = value, group = effect)) +
geom_ribbon(aes(ymax = value_max, ymin = value_min), fill = "#C0CCD9", alpha = 0.5 ) +
geom_line(aes(color = effect)) +
geom_point(aes(color = effect), size = 0.5) +
ylab("Y") + xlab("X") +
guides(color = guide_legend(keywidth = 2, keyheight = 1,
override.aes = list(size = 1, shape = NA))) +
facet_wrap(~type) +
theme_bw() +
theme(legend.key = element_rect(fill = NA, color = NA))

Resources