Order the plotting of data by the contents of a date column - r

I have dataset as follows:
df = data.frame(name = c('Ger1', 'Ger2', 'Ger3', 'Ger4', 'Ger5', 'Ger6'),
issued = c(UKS, USD, UKS, UKS, USD, USD),
mat = c(2024-01-31, 2023-01-31, 2026-10-22, 2022-07-22, 2029-01-31, 2025-06-07)
volume = c(0.476, 0.922, 0.580, 1.259, 0.932, 0.417)
I currently plot (and filter) the data using the following code:
plot1<- ggplot(subset(df, issued == "UKS")) +
geom_bar(stat="identity", aes(x=volume,y=name),fill="#1170aa")+
theme(title=element_text(size=12),
panel.background = element_rect(fill='white',color='black'),
legend.position='right')+
labs(title = "Total carriage by Volume on the day", x = "Volume", y = "Name")
I'd like to be able to order this data using the 'mat' column as guide, namely with the data that has the earliest 'mat' date at the top of the Y axis and the most distant 'mat' date at the bottom. Does anyone have any advice on how to achieve this?
Edit: I use grid arrange to plot it against another chart.
grid.arrange(plot1,plot2,ncol=2)
Sadly I get the following error:
Error in `-.POSIXt`(Maturity) : unary '-' is not defined for "POSIXt" objects

You can use stats::reorder() inside aes() to reorder the bars. If a factor is supplied you don't need to supply a FUN, but for a continuous variable like Date you can specify the way to sort. In your data (although you didn't post it this way in the question), it seems your mat variable is POSIXlt. This format cannot be directly operated on as a numeric. Instead, I suggest using POSIXct and then it will work. See ?stats::reorder for more info on how to control this. Another option is to set levels of the factor in the data before passing to ggplot() which might be a better option if you have complex sorting to do.
library(tidyverse)
df <- data.frame(name = c('Ger1', 'Ger2', 'Ger3', 'Ger4', 'Ger5', 'Ger6'),
issued = c("UKS", "USD", "UKS", "UKS", "USD", "USD"),
mat = c("2024-01-31", "2023-01-31", "2026-10-22", "2022-07-22", "2029-01-31", "2025-06-07"),
volume = c(0.476, 0.922, 0.580, 1.259, 0.932, 0.417))
df %>%
mutate(mat = as.POSIXct(mat)) %>%
filter(issued == "UKS") %>%
# mutate(name = fct_reorder(.f = name, .x = mat)) %>% ggplot(aes(volume, name)) +
ggplot(aes(x = volume, y = reorder(x = name, X = mat, FUN = sort))) +
geom_col(fill = "#1170aa") +
labs(title = "Total carriage by Volume on the day", x = "Volume", y = "Name") +
theme(
title = element_text(size = 12),
panel.background = element_rect(fill = 'white', color = 'black'),
legend.position = 'right'
)
Created on 2022-02-07 by the reprex package (v2.0.1)

So, I was able to fix the ordering issue by appending the mat column data to the name, and then adding:
+scale_y_discrete(limits=rev)
To the end of the ggplot code.

Related

Breaking up a large ggplot by category; subset shows no errors but plots no data

I have a very large dataset derived from a spreadsheet of the format below:
df = data.frame(name = c('Ger1', 'Ger2', 'Ger3', 'Ger4', 'Ger5', 'Ger6'),
issued = c(UKS, USD, UKS, UKS, USD, USD),
mat = c(2024-01-31, 2023-01-31, 2026-10-22, 2022-07-22, 2029-01-31, 2025-06-07)
volume = c(0.476, 0.922, 0.580, 1.259, 0.932, 0.417)
Currently, I plot all the data on one very long ggplot with the following code:
chart1<-ggplot(df)+geom_bar(stat="ID",aes(x=volume,y=name),fill="#1170aa")+
theme(title=element_text(size=12),panel.background = element_rect(fill='white',color='black'),legend.position='right')+
labs(title = "Total carriage by Volume on the day", x = "Volume", y = "Name")
Now while that worked for a while, given the size the dataset has grown to it is no longer feasible to use that way. Therefore I'd like to plot the data based on the contents of the "issued" column.
I first thought about a condition statement of the type:
if (df$issued == "UKS"){
chart1<-ggplot(df)+geom_bar(stat="ID",aes(x=volume,y=name),fill="#1170aa")+
theme(title=element_text(size=12),panel.background = element_rect(fill='white',color='black'),legend.position='right')+
labs(title = "Total carriage by Volume on the day", x = "Volume", y = "Name")
}
It unfortunately didn't work (although on closer inspection my logic wasn't particularly well thought-out)
I have then tried using the subset() function in the hopes that would allow to only plot data meeting my requirements as so:
chart1<-ggplot(subset(df, 'issued' == "UKS"))+geom_bar(stat="ID",aes(x=volume,y=name),fill="#1170aa")+
theme(title=element_text(size=12),panel.background = element_rect(fill='white',color='black'),legend.position='right')+
labs(title = "Total carriage by Volume on the day", x = "Volume", y = "Name")
This particular code didn't show up any errors, but the chart that was produced had no data on it all. Does anyone have any ideas on how I can filter and plot this data?
You don't need quote "" for column names in subset().
ggplot(subset(df, issued == "UKS")) +
geom_bar(stat="identity", aes(x=volume,y=name),fill="#1170aa")+
theme(title=element_text(size=12),
panel.background = element_rect(fill='white',color='black'),
legend.position='right')+
labs(title = "Total carriage by Volume on the day", x = "Volume", y = "Name")
Or use a tidyverse way of plotting:
library(tidyverse)
df %>% filter(issued == "UKS") %>%
ggplot() +
geom_bar(stat="identity", aes(x=volume,y=name),fill="#1170aa")+
theme(title=element_text(size=12),
panel.background = element_rect(fill='white',color='black'),
legend.position='right')+
labs(title = "Total carriage by Volume on the day", x = "Volume", y = "Name")

R: Creating multiple maps using a loop through variable names

I'd like to create a map that shows the value of variable for a given state. The dataset contains around a thousand variables and is at the state level, for about 100 years.
The code I have and works is:
plot_usmap(data = database, values = "var1") + scale_fill_continuous(
low = "white", high = "blue", na.value="light gray", name = "Title of graph", label = scales::comma
) + theme(legend.position = "right")
Now what I'd like to do is create this map for a list of about 15 variables and 10 years.
I'm usually a STATA user, and there I could define a variable list and then loop through the variable list. On page 7 of this document of "A Quick Introduction to R (for Stata Users)", I tried applying the following solution:
vars <- c("database$var1", "database$var2", "database$var3","database$var4", "database$var5", "database$var6", "database$var7", "database$var8", "database$var9", "database$var10", "database$var11", "database$var12")
for(var in vars) {
v <- get(var)
plot_usmap(data = darabase, values = "v") +
scale_fill_continuous(low = "white", high = "blue", na.value="light gray", name = "v", label = scales::comma) + theme(legend.position = "right")}
With this code, I get error "Error in get(var) : object 'database$var1' not found. When I try view(database$var1) it appears. The next problem is that I'd like the name of the graph to be the label of the variable rather than the variable. In the example above, I'd restricted the whole data to only include 1 year, so if there's a solution to set the code up that I could use the whole database but map only select years, that would be great.
Any insights would be appreciated! I read that in R, "for" isn't used as much, so if there is a better way to do it, please let me know.
Basically it't not that different in R. First, there is no need to use get and in general should be avoided. Second, while for loops are fine the more R-ish way would be to use lapply. Especially when making plots via ggplot2 it is recommended to use lapply.
Making use of some fake example data to mimic your database:
library(usmap)
library(ggplot2)
# Example data
database <- statepop
names(database) <- c("fips", "abbr", "full", "var1")
database$var2 <- database$var1
vars <- c("var1", "var2")
lapply(vars, function(x) {
plot_usmap(data = database, values = x) +
scale_fill_continuous(
low = "white", high = "blue", na.value="light gray", name = "Title of graph", label = scales::comma
) +
theme(legend.position = "right") +
labs(title = x)
})
#> [[1]]
#>
#> [[2]]
EDIT Assuming that your data contains a column with years I would suggest to wrap the plotting code inside a function which takes your database, a vectors of vars and the desired year as a argument. But there are other approaches and which works best depends on your desired result.
library(usmap)
library(ggplot2)
library(labelled)
# Example data
database <- statepop
names(database) <- c("fips", "abbr", "full", "var1")
database$year <- 2015
database <- rbind(database, transform(database, year = 2020))
var_label(database$var1) <- "Population"
vars <- c("var1")
names(vars) <- vars
map_vars <- function(.data, vars, year) {
lapply(vars, function(x, year) {
.data <- .data[.data$year == year, ]
plot_usmap(data = database, values = x) +
scale_fill_continuous(
low = "white", high = "blue", na.value = "light gray", name = "Title of graph", label = scales::comma
) +
theme(legend.position = "right") +
labs(title = paste(var_label(database[[x]]), "in", year))
}, year = year)
}
map_vars(database, vars, 2015)
#> $var1
map_vars(database, vars, 2020)
#> $var1

ggplot2 - Two color series in area chart

I've got a question regarding an edge case with ggplot2 in R.
They don't like you adding multiple legends, but I think this is a valid use case.
I've got a large economic dataset with the following variables.
year = year of observation
input_type = *labor* or *supply chain*
input_desc = specific type of labor (eg. plumbers OR building supplies respectively)
value = percentage of industry spending
And I'm building an area chart over approximately 15 years. There are 39 different input descriptions and so I'd like the user to see the two major components (internal employee spending OR outsourcing/supply spending)in two major color brackets (say green and blue), but ggplot won't let me group my colors in that way.
Here are a few things I tried.
Junk code to reproduce
spec_trend_pie<- data.frame("year"=c(2006,2006,2006,2006,2007,2007,2007,2007,2008,2008,2008,2008),
"input_type" = c("labor", "labor", "supply", "supply", "labor", "labor","supply","supply","labor","labor","supply","supply"),
"input_desc" = c("plumber" ,"manager", "pipe", "truck", "plumber" ,"manager", "pipe", "truck", "plumber" ,"manager", "pipe", "truck"),
"value" = c(1,2,3,4,4,3,2,1,1,2,3,4))
spec_broad <- ggplot(data = spec_trend_pie, aes(y = value, x = year, group = input_type, fill = input_desc)) + geom_area()
Which gave me
Error in f(...) : Aesthetics can not vary with a ribbon
And then I tried this
sff4 <- ggplot() +
geom_area(data=subset(spec_trend_pie, input_type="labor"), aes(y=value, x=variable, group=input_type, fill= input_desc)) +
geom_area(data=subset(spec_trend_pie, input_type="supply_chain"), aes(y=value, x=variable, group=input_type, fill= input_desc))
Which gave me this image...so closer...but not quite there.
To give you an idea of what is desired, here's an example of something I was able to do in GoogleSheets a long time ago.
It's a bit of a hack but forcats might help you out. I did a similar post earlier this week:
How to factor sub group by category?
First some base data
set.seed(123)
raw_data <-
tibble(
x = rep(1:20, each = 6),
rand = sample(1:120, 120) * (x/20),
group = rep(letters[1:6], times = 20),
cat = ifelse(group %in% letters[1:3], "group 1", "group 2")
) %>%
group_by(group) %>%
mutate(y = cumsum(rand)) %>%
ungroup()
Now, use factor levels to create gradients within colors
df <-
raw_data %>%
# create factors for group and category
mutate(
group = fct_reorder(group, y, max),
cat = fct_reorder(cat, y, max) # ordering in the stack
) %>%
arrange(cat, group) %>%
mutate(
group = fct_inorder(group), # takes the category into account first
group_fct = as.integer(group), # factor as integer
hue = as.integer(cat)*(360/n_distinct(cat)), # base hue values
light_base = 1-(group_fct)/(n_distinct(group)+2), # trust me
light = floor(light_base * 100) # new L value for hcl()
) %>%
mutate(hex = hcl(h = hue, l = light))
Create a lookup table for scale_fill_manual()
area_colors <-
df %>%
distinct(group, hex)
Lastly, make your plot
ggplot(df, aes(x, y, fill = group)) +
geom_area(position = "stack") +
scale_fill_manual(
values = area_colors$hex,
labels = area_colors$group
)

Alter name of trace in legend after plot is created (by package) in ggplot2

I'm wondering if there is any easy way to change the name in a legend (given using the colour aesthetic) on a ggplot after the plot is created. I know this feels a bit hacky and would normally be changed in the data or when the plot is created, but I want to change the label on a plot that is created by another package, and there's no option in the package to change it.
I could obviously copy the function and save my own version and change it, but I just want to change one thing so it seems neater if I can just do it afterwards.
Here is an example with some dummy data, basically I want to relabel the Mean and Median timeseries that come out of fasstr's plot_daily_stats to "Modelled Mean" and "Modelled Median" so they cannot be confused with the observed mean which I am manually adding.
library(fasstr)
library(tibble)
library(ggplot2)
#create some fake data
df <- tibble(Date = seq.Date(from = as.Date("1991-01-01"), as.Date("1997-12-31"),
by = "day"),
DayOfYear = as.numeric(format(Date, "%j")),
Value = runif(2557,0,1) + 50 + (cos((1/60)*DayOfYear)+4))
obsdf <- tibble(Date = seq.Date(from = as.Date("1900-01-01"), as.Date("1900-12-31"),
by = "day"),
DayOfYear = as.numeric(format(Date, "%j")),
Value = runif(365,0,1) + 51 + (cos((1/60)*DayOfYear)+4))
# create plot using fasstr package
plt1<- fasstr::plot_daily_stats(df)
# add my own trace. I also want to rename the trace "Mean" to
# "Modelled Mean" to avoid confusion (and same with Median)
plt1$Daily_Statistics +
geom_line(data = obsdf, aes(x = Date, y = Value, colour = "Observed Mean"))+
scale_colour_manual(values = c("red", "black","blue"))
The names are given in fasstr as hard coded names:
daily_plots <- ... +
ggplot2::geom_line(ggplot2::aes(y = Median, colour = "Median")) +
ggplot2::geom_line(ggplot2::aes(y = Mean, colour = "Mean"))
No hacking needed, just add labels to your manual scale.
plt1$Daily_Statistics +
geom_line(data = obsdf, aes(x = Date, y = Value, colour = "Observed Mean"))+
scale_colour_manual(labels = c("Modelled Mean","Modelled Median","Observed Mean"),
values = c("red", "black","blue"))

Add text labels to a ggplot2 mosaic plot

Using the following data:
Category <- c("Bankpass", "Bankpass", "Bankpass", "Moving", "Moving")
Subcategory <- c("Stolen", "Lost", "Login", "Address", "New contract")
Weight <- c(10,20,13,40,20)
Duration <- as.character(c(0.2,0.4,0.5,0.44,0.66))
Silence <- as.character(c(0.1,0.3,0.25,0.74,0.26))
df <- data.frame(Category, Subcategory, Weight, Duration, Silence)
Which I use to create the following mosaic plot:
library (ggplot2)
library (ggmosaic)
g <- ggplot(data = df) +
geom_mosaic(aes(weight = Weight, x = product(Category), fill = Duration),
offset = 0, na.rm = TRUE) +
theme(axis.text.x = element_text(angle = -25, hjust = .1)) +
theme(axis.title.x = element_blank()) +
scale_fill_manual(values = c("#e8f5e9", "#c8e6c9", "#a5d6a7", "#81c784", "#66bb6a"))
This works, however I would like to include text labels on the elements on the graph ("Showing fe stolen, lost" etc.)
However, when I do:
g + geom_text(x = Category, y = Subcategory, label = Weight)
I get the following error:
Error in UseMethod("rescale") : no applicable method for 'rescale' applied to an object of class "character"
Any thoughts on what goes wrong here?
Here is my attempt. The x-axis is in a discrete variable (i.e., Category). So you cannot use it in geom_text(). You somehow need to create a numeric variable for the axis. Similarly, you need to find position in the y-axis for labels. In order to get numeric values for the two dimensions, I decided to access to the data frame staying behind your graphic. When you use the ggmosaic package, there is one data frame behind a graphic in this case. You can get it using ggplot_build(). You can calculate x and y values using the information in the data frame (e.g., xmin, and xmax). This is good news. But, we have bad news too. When you reach the data, you realize that there is no information about Subcategory that you need for labels.
We can overcome this challenge joining the data frame above with the original data. When I joined the data, I calculated proportion for both the original data and the other data. The values are purposely converted to character. temp is the data set you need in order to add labels.
library(dplyr)
library(ggplot2)
library(ggmosaic)
# Add proportion for each and convert to character for join
df <- group_by(df, Category) %>%
mutate(prop = as.character(round(Weight / sum(Weight),3)))
# Add proportion for each and convert to character.
# Get x and y values for positions
# Use prop for join
temp <- ggplot_build(g)$data %>%
as.data.frame %>%
transmute(prop = as.character(round(ymax - ymin, 3)),
x.position = (xmax + xmin) / 2,
y.position = (ymax + ymin) / 2) %>%
right_join(df)
g + geom_text(x = temp$x.position, y = temp$y.position, label = temp$Subcategory)
I think you are looking for something like this
library(ggplot2)
library(ggmosaic)
Your data:
Category <- c("Bankpass", "Bankpass", "Bankpass", "Moving", "Moving")
Subcategory <- c("Stolen", "Lost", "Login", "Address", "New contract")
Weight <- c(10,20,13,40,20)
Duration <- as.character(c(0.2,0.4,0.5,0.44,0.66))
Silence <- as.character(c(0.1,0.3,0.25,0.74,0.26))
mydf <- data.frame(Category, Subcategory, Weight, Duration, Silence)
ggplot(data = mydf) +
geom_mosaic(aes( x = product(Duration, Subcategory), fill=factor(Duration)), na.rm=TRUE) +
theme(axis.text.x=element_text(angle=-25, hjust= .1)) +
labs(x="Subcategory", title='f(Duration, Subcategory | Category)') +
facet_grid(Category~.) +
guides(fill=guide_legend(title = "Duration", reverse = TRUE))
The output is:
It is almost the best you can do on ggmosaic package. You should try other packages.
Good luck for your project work ;-)

Resources