Let's say I have an example data frame:
frame <-
data.frame(group = c(rep(1, 3), rep(2, 3)),
idea = c(1, 2, 3, 1, 2, 4),
value = c(10000, 5000, 50, 5000, 7500, 100),
level = sample(c("rough", "detailed"), 6, TRUE))
I'd like a barplot of values with each idea within a group ordered by it's value. I can get close like this
library(dplyr)
library(ggplot2)
top_ideas <-
frame %>%
group_by(group) %>%
arrange(group, desc(value))
frame %>%
group_by(group) %>%
mutate(idea = idea %>% factor(levels = top_ideas$idea)) %>%
ggplot(aes_string(x = "idea", y = "value", fill = "level")) +
geom_bar(stat = "identity") +
theme(legend.position = "bottom",
axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
facet_wrap(~group, scales = "free")
The mutate in the final dplyr line is setting the factor levels according to their ordering in the top_ideas dataframe above that. Unfortunately, because the idea nos 1 and 2 are shared by both groups 1 and 2, the ordering is set by the first group.
What I'd like to have is the ordering of ideas in both facets independent of each group. How can I do that in a dplyr string? Am I missing something simple?
I should note that this is example data. The actual data is much larger and encompasses more groups and more ideas that are shared.
Here is a workaround:
Data:
# setting seed to make solution reproducible
set.seed(123)
frame <-
data.frame(group = c(rep(1, 3), rep(2, 3)),
idea = c(1, 2, 3, 1, 2, 4),
value = c(10000, 5000, 50, 5000, 7500, 100),
level = sample(c("rough", "detailed"), 6, TRUE))
Code:
library(dplyr)
library(tidyr)
library(ggplot2)
top_ideas <-
frame %>%
group_by(group) %>%
arrange(group, desc(value)) %>%
unite("grp_idea", group, idea, sep = "_", remove = FALSE) %>%
data.frame() %>%
mutate(grp_idea = factor(grp_idea, levels = grp_idea))
top_ideas %>%
group_by(group) %>%
ggplot(aes(x = grp_idea, y = value, fill = level)) +
geom_bar(stat = "identity") +
theme(legend.position = "bottom",
axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
facet_wrap(~group, scales = "free") +
xlab("idea") +
scale_x_discrete(breaks = top_ideas$grp_idea,
labels = top_ideas$idea)
Results:
> top_ideas
grp_idea group idea value level
1 1_1 1 1 10000 rough
2 1_2 1 2 5000 detailed
3 1_3 1 3 50 rough
4 2_2 2 2 7500 detailed
5 2_1 2 1 5000 detailed
6 2_4 2 4 100 rough
Note:
Basically what I'm doing is to paste together group and idea variables, convert the new variable grp_idea to a factor with the desired levels, and use that as the x-axis instead of the original idea column. This ensures that the ordering of levels in each facet will not be affected by other facets since they no longer share the same levels. It is then easy enough to relabel the x-axis title and tick labels with xlab and scale_x_discrete.
Related
I am trying to order the time and date axes on my scatter plot into epochs/ time periods. For example, times between 12pm-:7:59pm and 9pm-11:59pm. I want to do something similar for the dates.
I am fairly new to R so I am just looking for suggestions/ to be told if this is even possible and maybe some alternatives:)
This is my code so far:
accident <- read.csv("accidents.csv",header = TRUE)
accident <- accident %>%
ggplot(data=accident)+
geom_point(mapping=aes(x=Time, y=Date, alpha=0.5))
Thank you!
Welcome to R! Here is one set of options.
library(tidyverse)
library(lubridate)
First, simulate dataset
accident <-
rnorm(n = 1000, mean = 1500000000, sd = 1000000) %>%
tibble(date_time = .) %>%
mutate(date_time = as.POSIXct(date_time, origin = "1970-01-01")) %>%
separate(date_time, into = c("date", "time"), sep = " ", remove = F)
Original plot:
accident %>%
ggplot()+
geom_point(aes(x=time, y=date), alpha=0.5)
Step 1: Collapse the x axis into smaller number of groups
accidents_per_trihour <-
accident %>%
mutate(hour = floor_date(date_time, unit = "hour"),
hour = as.numeric(str_sub(hour, 12,13)),
tri_hour = cut(hour, c(0, 3, 6, 9, 12, 15, 18, 21, 24), include.lowest = T)) %>%
group_by(date, tri_hour) %>%
count()
Then scale dot size by number of accidents
accidents_per_trihour %>%
ggplot()+
geom_point(aes(x=tri_hour, y=date, size = n), alpha=0.5) +
labs(x = "\nTime (in three-hour groups)", y = "Day\n", size = "Accidents count")
Still not great because the y axis is too expansive. So:
Step 2: Collapse the y axis into smaller number of groups
(For your data you may need to group into months for things to start to look reasonable)
accidents_per_trihour_per_week <-
accident %>%
mutate(hour = floor_date(date_time, unit = "hour"),
hour = as.numeric(str_sub(hour, 12,13)),
tri_hour = cut(hour, c(0, 3, 6, 9, 12, 15, 18, 21, 24), include.lowest = T)) %>%
mutate(week_start = floor_date(as.Date(date), unit = "weeks"),
week = format.Date(week_start, "%Y, week %W")) %>%
group_by(week, tri_hour) %>%
count()
Should be much more readable now
We’ll improve the theme as well, just because.
if (!require(ggthemr)) devtools::install_github('cttobin/ggthemr')
ggthemr::ggthemr("flat") ## helps with pretty theming
accidents_per_trihour_per_week %>%
ggplot()+
geom_point(aes(x=tri_hour, y=week, size = n), alpha = 0.9) +
labs(x = "\nTime (in three-hour groups)", y = "Week\n", size = "Accidents count")
Could also do a tile plot
accidents_per_trihour_per_week %>%
ggplot() +
geom_tile(aes(x = tri_hour, y = week, fill = n)) +
geom_label(aes(x = tri_hour, y = week, label = n), alpha = 0.4, size = 2.5, fontface = "bold") +
labs(x = "\nTime (in three-hour groups)", y = "Week\n", fill = "Accidents count")
Created on 2021-11-24 by the reprex package (v2.0.1)
I have a table in Excel that show which weapon where used to commit murder per year.
I want to read the data and plot it as a line plot with multiple lines:
However my code only gives me a gibberish graph:
This is my code:
library("readxl")
library(data.table)
library(ggplot2)
# Read excel data
res <- as.data.frame(readxl::read_excel("murders_per_modus_veapon.xlsx", sheet = 1))
res$r = c('Unknown','Knife/stabbing weapon','Axe','Firearms','Suffocation','Blunt violence','Other')
res = reshape::melt(res)
# Plot data
ggplot(res, aes(x=variable,y=value))+ geom_line()
You need to convert your variable column from factor to date. See below;
library(reshape2)
library(dplyr)
library(lubridate)
library(ggplot2)
reshape2::melt(res, value.name = "MR", variable.name = "Year") %>%
mutate(Year = make_date(as.character(Year), 1, 1)) %>%
ggplot(., aes(x=Year, y=MR, color = Modus)) +
geom_line() +
scale_x_date(date_breaks = "1 year",date_labels = "%Y")
#> Using Modus as id variables
Created on 2021-05-25 by the reprex package (v2.0.0)
Data:
read.table(text = "Modus 2018 2019 2020
Unknown 2 0 0
Knife_stabbing_weapon 8 14 16
Axe 1 1 0
Firearms 3 3 2
Suffocation 2 5 6
Blunt_violence 8 3 4
Other 1 1 0",
header = T, stringsAsFactors = F, check.names = F) -> res
You can still run this part from your own code to import the data, and then use mine for plotting;
res <- as.data.frame(readxl::read_excel("murders_per_modus_veapon.xlsx", sheet = 1))
res$r = c('Unknown','Knife/stabbing weapon','Axe','Firearms',
'Suffocation','Blunt violence','Other')
Since you have not provided reproducible data, I created tiny data that may look like a small subset of your original xlsx data:
df = data.frame(weapon = c("Unknown", "knife", "axe"),
x2018 = c(2, 8, 1),
x2019 = c(0, 14, 1),
x2020 = c(0, 16, 0))
Then, I make the data tidy using dplyr and tidyr. Finally I produce a line plot that you may be looking for:
df %>%
pivot_longer(cols = 2:4, names_to = "year", values_to = "amount") %>%
mutate(year = gsub("x", "", year)) %>%
ggplot(aes(as.numeric(year), amount, col = weapon)) +
geom_line() +
scale_x_continuous(breaks = c(2018, 2019, 2020))
This could be an option. Thanks to M-- for the data.
library(ggplot2)
library(ggrepel)
df <- df %>%
pivot_longer(
cols = c("2018", "2019", "2020")
) %>%
mutate(label = if_else(name == max(name), as.character(Modus), NA_character_)) %>%
mutate(name = as.factor(name))
p <- ggplot(df, aes(x=name, y=value, colour=Modus, group=Modus)) +
geom_point () +
geom_line(size = 0.8) +
theme_bw()
p + geom_label_repel(aes(label = label),
nudge_x = 1,
na.rm = TRUE) +
theme(legend.position = "none")
I have the following data frame which contains 4 columns of data in addition to the vector of labels c.
Time <-c(1:4)
d<-data.frame(Time,
x1= rpois(n = 4, lambda = 10),
x2= runif(n = 4, min = 1, max = 10),
x3= rpois(n = 4, lambda = 5),
x4= runif(n = 4, min = 1, max = 5),
c=c(1,1,2,3))
I would like to use ggpolt to plot 4 curves"x1,..,x4" above each others where each curve is colored according to the label. So curves x1 and x2 are colored by the same color since they have the same label where as curves x3 and x4 in different colors.
I did the following
d %>% pivot_longer(-c(Time,x1,x2,x3,x4))%>%
rename(class=value) %>% select(-name) %>%
pivot_longer(-c(Time,class)) %>%
mutate(Label=ifelse(Time==max(Time,na.rm = T),name,NA),
Label=ifelse(duplicated(Label),NA,Label)) %>%
ggplot(aes(x=Time,y=value,color=factor(class),group=name))+
geom_line()+
labs(color='class')+
scale_color_manual(values=c('red','blue','green'))+
geom_label_repel(aes(label = Label),
nudge_x = 1.5,
na.rm = TRUE,show.legend = F,color='black')
but I don't get the needed plot, the resulted curves are not colored according to the label. I want x1 and x2 in red, x3 in blue and x4 in green.
To add: I would like to obtain the same plot above in the following general case, where I can't add the vector c to the data frame as length(c) is not equal to length(x1)=...=length(x4)
Time <-c(1:5)
d<-data.frame(Time,
x1= rpois(n = 5, lambda = 10),
x2= runif(n = 5, min = 1, max = 10),
x3= rpois(n = 5, lambda = 5),
x4= runif(n = 5, min = 1, max = 5))
and c=c(1,1,2,3)
As you point out in your comments, it is only possible to put the vector of colors as a column in the original data.frame because it happens to be square, but this is a dangerous way to store the information because the colors really belong to the columns rather than the rows. It's better to assign the colors separately and then join into the long format data by variable name prior to plotting.
Below is an example of how I'd do this with your data.
First, prepare the data without the color mapping for each variable, we'll do that next:
# load necessary packages
library(tidyverse)
library(ggrepel)
# set seed to make simulated data reproducible
set.seed(1)
# simulate data
Time <-c(1:4)
d <- data.frame(Time,
x1 = rpois(n = 4, lambda = 10),
x2 = runif(n = 4, min = 1, max = 10),
x3 = rpois(n = 4, lambda = 5),
x4 = runif(n = 4, min = 1, max = 5))
Next, make a separate data.frame that maps the color grouping to the variable names. At some point you'll want to make this a factor (i.e. discrete rather than continuous) to map it to color so I just do it here but it can be done later in the ggplot call if you prefer. Per your request, this solution easily scales with your dataset without needing to manually set each level, but it requires that your vector of color mappings is in the same order and the same length as the variable names in d unless you have some other way to establish that relationship.
# create separate df with color groupings for variable in d
color_grouping <- data.frame(var = names(d)[-1],
color_group = factor(c(1, 1, 2, 3)))
Then you pivot_longer and do a join to merge the color mapping with the data for plotting.
# pivot d to long and merge in color codes
d_long <- d %>%
pivot_longer(cols = -Time, names_to = "var", values_to = "value") %>%
left_join(., color_grouping)
# inspect final table prior to plotting to confirm color mappings
head(d_long, 4)
# # A tibble: 4 x 4
# Time var value color_group
# <int> <chr> <dbl> <fct>
# 1 1 x1 8 1
# 2 1 x2 1.56 1
# 3 1 x3 4 2
# 4 1 x4 4.97 3
Finally, generate line plot where color is mapped to the color_group variable. To ensure you get one line per original variable you also need to set group = var. For more info on this check the documentation on grouping.
# plot data adding labels for each line
p <- d_long %>%
ggplot(aes(x = Time, y = value, group = var, color = color_group)) +
geom_line() +
labs(color='class') +
scale_color_manual(values=c('red','blue','green')) +
geom_label_repel(aes(label = var),
data = d_long %>% slice_max(order_by = Time, n = 1),
nudge_x = 1.5,
na.rm = TRUE,
show.legend = F,
color='black')
p
This produces the this plot:
In your comment you suggested wanting to separate out and stacking the plots. I'm not sure I fully understood, but one way to accomplish this is with faceting.
For example if you wanted to facet out separate panels by color_group, you could add this line to the plot above:
p + facet_grid(rows = "color_group")
Which gives this plot:
Note that the faceting variable must be put in quotes.
You were on the right path, but you need a little bit of a different structure to use ggplot:
# delete old color column
d$c <- NULL
# reshape df
plot.d <- reshape2::melt(d, id.vars = c("Time"))
# create new, correct color column
plot.d$c <- NA
plot.d$c[plot.d$variable == "x1"] <- 1
plot.d$c[plot.d$variable == "x2"] <- 1
plot.d$c[plot.d$variable == "x3"] <- 2
plot.d$c[plot.d$variable == "x4"] <- 3
# plot
ggplot(plot.d, aes(x=Time, y=value, color=as.factor(c), group = variable))+
geom_line() +
labs(color='class')+
scale_color_manual(values=c('red','blue','green'))
Note that I omitted the labels for brevity, but you can add them back in using the same logic. The code above gives the following result:
Here is a solution for how I understood your question.
The DF is brought in the long format, the variable c is replaced with mutate / case_when with the number code you have used.
I have set a seed for better reproducibility.
library(tidyverse)
library(ggrepel)
set.seed(1)
# YOUR DATA
Time <- c(1:4)
d <- data.frame(Time,
x1 = rpois(n = 4, lambda = 10),
x2 = runif(n = 4, min = 1, max = 10),
x3 = rpois(n = 4, lambda = 5),
x4 = runif(n = 4, min = 1, max = 5),
c = c(1, 1, 2, 3)
)
d %>%
pivot_longer(cols = x1:x4) %>% # make it long
mutate(c = as.factor(case_when( # replace consistently
name == "x1" | name == "x2" ~ 1, # according to YOUR DATA
name == "x3" ~ 2,
name == "x4" ~ 3
))) %>%
mutate(
Label = ifelse(Time == max(Time, na.rm = T), name, NA),
Label = ifelse(duplicated(Label), NA, Label)
) %>%
ggplot(aes(x = Time, y = value, color = c, group = name)) +
geom_line() +
labs(color = "class") +
scale_color_manual(values = c("red", "blue", "green")) + # YOUR CHOICE
geom_label_repel(aes(label = Label),
nudge_x = 1.5,
na.rm = TRUE, show.legend = F, color = "black"
)
ADDED
You could leave the c out and color according to name.
The color code was neccessary because you wanted 2 names with the same color. If that is not needed, the following code can do it.
d %>%
pivot_longer(cols = x1:x4) %>% # make it long
mutate(
Label = ifelse(Time == max(Time, na.rm = T), name, NA),
Label = ifelse(duplicated(Label), NA, Label)
) %>%
ggplot(aes(x = Time, y = value, color = name, group = name)) +
geom_line() +
geom_label_repel(aes(label = Label),
nudge_x = 1.5,
na.rm = TRUE, show.legend = F, color = "black"
)
I'm trying to plot the Current_Prices and Test_Prices like fill in ggplot, but I cannot figure out the function to combine those two columns after running summarise in dplyr.
carrier%>%
group_by(!!category) %>%
summarise(Current_Prices = mean(original_percent_higher[original_percent_higher>0], na.rm = TRUE),
Test_Prices = mean(Percent_Higher[Percent_Higher>0], na.rm = TRUE)) %>%
ggplot(aes_string(input$Category, "Current_Prices")) +
geom_bar(stat = "identity", position = "dodge", na.rm = TRUE) +
geom_text(aes(label = scales::percent(Current_Prices)), size = 3.2, vjust = -0.5,
position = position_dodge(width = 1), na.rm = TRUE)
What I would like to create between summarise and ggplot is something like:
Class Version Mean
a current 1
a test 2
b current 3
b test 4
So then I can run ggplot with Class as x, Mean as y, and Version as fill.
What I have right now after summarise is something like:
Class Current Test
a 1 2
b 3 4
c 5 6
Currently, the values are in 'wide' format, for ggplot you want them in 'long' format as you've stated. The following will convert the output of summarise you've described to the format you want. The creation of data_wide is just for illustration purposes for this example. You can of course also insert this into your existing statement using the pipe (%>%) operator:
library(tidyr)
data_wide <- data.frame(Class = c('a', 'b', 'c'),
Current = c(1, 3, 5),
Test = c(2, 4, 6))
data_long <- gather(data_wide, Version, Mean, Current:Test, factor_key=TRUE)
DMR's answer points in the right direction but is incomplete as it has left out the plotting part.
library(tidyr)
library(ggplot2)
data.frame(Class = c('a', 'b', 'c'),
Current = c(1, 3, 5),
Test = c(2, 4, 6)) %>%
# reshape from wide to long format
gather(Version, Mean_Price, -Class) %>%
# here comes the plotting stuff
ggplot(aes(Class, Mean_Price, fill = Version)) +
geom_bar(stat = "identity", position = "dodge", na.rm = TRUE) +
geom_text(aes(label = scales::percent(Mean_Price)), size = 3.2, vjust = -0.5,
position = position_dodge(width = 1), na.rm = TRUE)
I have a data set of 600 respondents. For each respondent I want to track the level of an indicator over time using a line graph. The indicator is divided in 5 levels - 0,1,2,3,4, and I have the indicator for 4 years - 2014, 2015, 2016 and 2017.
So I want the sample number on y-axis, and one line representing each respondent across the 4 time periods indicating the level of indicator. How can this be possible? I appreciate your help!
I want to further facet out this graph using deciles of income in the income variable.
Sample dataframe:
df <- data.frame(c(1:5), c(0, 1, 0, 2, 2), c(1, 2, 2, 4, 4), c(2, 3, 3, 4, 4), c(3,3,3,4,4), c(10000, 200000, 15000, 40000, 350000)
colnames(df) <- c("sample_no", paste("indicator_level_", 14:17, sep=""), "annual_income")
Would this be acceptable to you?
library(ggplot2)
library(dplyr)
library(tidyr)
library(magrittr)
I only use 3 different clusters (instead of 10) based on quantile because of the number of rows.
df2 <- df %>%
mutate(quantile = ntile(annual_income, 3)) %>%
gather(indicator_level_14, indicator_level_15, indicator_level_16, indicator_level_17,
key = "Indicator", value = "Value")
ggplot(df2, aes(x = Indicator, y = Value, color = as.factor(sample_no))) +
geom_line(aes(group = sample_no)) +
facet_wrap(~quantile) +
theme(axis.text.x = element_text(angle = 50, hjust = 1)) +
labs(color = "Sample")