I'd like to create a map that shows the value of variable for a given state. The dataset contains around a thousand variables and is at the state level, for about 100 years.
The code I have and works is:
plot_usmap(data = database, values = "var1") + scale_fill_continuous(
low = "white", high = "blue", na.value="light gray", name = "Title of graph", label = scales::comma
) + theme(legend.position = "right")
Now what I'd like to do is create this map for a list of about 15 variables and 10 years.
I'm usually a STATA user, and there I could define a variable list and then loop through the variable list. On page 7 of this document of "A Quick Introduction to R (for Stata Users)", I tried applying the following solution:
vars <- c("database$var1", "database$var2", "database$var3","database$var4", "database$var5", "database$var6", "database$var7", "database$var8", "database$var9", "database$var10", "database$var11", "database$var12")
for(var in vars) {
v <- get(var)
plot_usmap(data = darabase, values = "v") +
scale_fill_continuous(low = "white", high = "blue", na.value="light gray", name = "v", label = scales::comma) + theme(legend.position = "right")}
With this code, I get error "Error in get(var) : object 'database$var1' not found. When I try view(database$var1) it appears. The next problem is that I'd like the name of the graph to be the label of the variable rather than the variable. In the example above, I'd restricted the whole data to only include 1 year, so if there's a solution to set the code up that I could use the whole database but map only select years, that would be great.
Any insights would be appreciated! I read that in R, "for" isn't used as much, so if there is a better way to do it, please let me know.
Basically it't not that different in R. First, there is no need to use get and in general should be avoided. Second, while for loops are fine the more R-ish way would be to use lapply. Especially when making plots via ggplot2 it is recommended to use lapply.
Making use of some fake example data to mimic your database:
library(usmap)
library(ggplot2)
# Example data
database <- statepop
names(database) <- c("fips", "abbr", "full", "var1")
database$var2 <- database$var1
vars <- c("var1", "var2")
lapply(vars, function(x) {
plot_usmap(data = database, values = x) +
scale_fill_continuous(
low = "white", high = "blue", na.value="light gray", name = "Title of graph", label = scales::comma
) +
theme(legend.position = "right") +
labs(title = x)
})
#> [[1]]
#>
#> [[2]]
EDIT Assuming that your data contains a column with years I would suggest to wrap the plotting code inside a function which takes your database, a vectors of vars and the desired year as a argument. But there are other approaches and which works best depends on your desired result.
library(usmap)
library(ggplot2)
library(labelled)
# Example data
database <- statepop
names(database) <- c("fips", "abbr", "full", "var1")
database$year <- 2015
database <- rbind(database, transform(database, year = 2020))
var_label(database$var1) <- "Population"
vars <- c("var1")
names(vars) <- vars
map_vars <- function(.data, vars, year) {
lapply(vars, function(x, year) {
.data <- .data[.data$year == year, ]
plot_usmap(data = database, values = x) +
scale_fill_continuous(
low = "white", high = "blue", na.value = "light gray", name = "Title of graph", label = scales::comma
) +
theme(legend.position = "right") +
labs(title = paste(var_label(database[[x]]), "in", year))
}, year = year)
}
map_vars(database, vars, 2015)
#> $var1
map_vars(database, vars, 2020)
#> $var1
Related
I have dataset as follows:
df = data.frame(name = c('Ger1', 'Ger2', 'Ger3', 'Ger4', 'Ger5', 'Ger6'),
issued = c(UKS, USD, UKS, UKS, USD, USD),
mat = c(2024-01-31, 2023-01-31, 2026-10-22, 2022-07-22, 2029-01-31, 2025-06-07)
volume = c(0.476, 0.922, 0.580, 1.259, 0.932, 0.417)
I currently plot (and filter) the data using the following code:
plot1<- ggplot(subset(df, issued == "UKS")) +
geom_bar(stat="identity", aes(x=volume,y=name),fill="#1170aa")+
theme(title=element_text(size=12),
panel.background = element_rect(fill='white',color='black'),
legend.position='right')+
labs(title = "Total carriage by Volume on the day", x = "Volume", y = "Name")
I'd like to be able to order this data using the 'mat' column as guide, namely with the data that has the earliest 'mat' date at the top of the Y axis and the most distant 'mat' date at the bottom. Does anyone have any advice on how to achieve this?
Edit: I use grid arrange to plot it against another chart.
grid.arrange(plot1,plot2,ncol=2)
Sadly I get the following error:
Error in `-.POSIXt`(Maturity) : unary '-' is not defined for "POSIXt" objects
You can use stats::reorder() inside aes() to reorder the bars. If a factor is supplied you don't need to supply a FUN, but for a continuous variable like Date you can specify the way to sort. In your data (although you didn't post it this way in the question), it seems your mat variable is POSIXlt. This format cannot be directly operated on as a numeric. Instead, I suggest using POSIXct and then it will work. See ?stats::reorder for more info on how to control this. Another option is to set levels of the factor in the data before passing to ggplot() which might be a better option if you have complex sorting to do.
library(tidyverse)
df <- data.frame(name = c('Ger1', 'Ger2', 'Ger3', 'Ger4', 'Ger5', 'Ger6'),
issued = c("UKS", "USD", "UKS", "UKS", "USD", "USD"),
mat = c("2024-01-31", "2023-01-31", "2026-10-22", "2022-07-22", "2029-01-31", "2025-06-07"),
volume = c(0.476, 0.922, 0.580, 1.259, 0.932, 0.417))
df %>%
mutate(mat = as.POSIXct(mat)) %>%
filter(issued == "UKS") %>%
# mutate(name = fct_reorder(.f = name, .x = mat)) %>% ggplot(aes(volume, name)) +
ggplot(aes(x = volume, y = reorder(x = name, X = mat, FUN = sort))) +
geom_col(fill = "#1170aa") +
labs(title = "Total carriage by Volume on the day", x = "Volume", y = "Name") +
theme(
title = element_text(size = 12),
panel.background = element_rect(fill = 'white', color = 'black'),
legend.position = 'right'
)
Created on 2022-02-07 by the reprex package (v2.0.1)
So, I was able to fix the ordering issue by appending the mat column data to the name, and then adding:
+scale_y_discrete(limits=rev)
To the end of the ggplot code.
This is the coding that I have right now that creates the graph that I want. This is only for well 'aa1' though, and I want to learn how to make a loop so that I can make this graph for all of my wells. Any ideas?
longer_raindata %>%
select(well, metal, level, smpl_mth) %>%
filter(well == 'aa1') %>%
ggplot(aes(metal, level, fill = smpl_mth))+
scale_fill_manual(values = c('plum', 'lightsteelblue', 'darkolivegreen', 'indianred'))+
geom_col(position = "dodge")+
labs(title = 'Metals in Well AA1',
x = 'Metals',
y = 'µg/l')
One option to achieve your desired result would be to put your code into a plotting function which takes one argument .well and adjust your filter statement to filter your data for that .well.
Then you could use e.g. lapply to loop over the unique wells in your dataset to get a list of plots for each well.
Using some fake example data:
library(ggplot2)
library(dplyr)
longer_raindata <- data.frame(
metal = LETTERS[1:3],
level = 1:6,
well = rep(c("aa1", "aa2"), each = 6),
smpl_mth = rep(letters[1:2], each = 3)
)
plot_fun <- function(.well) {
longer_raindata %>%
select(well, metal, level, smpl_mth) %>%
filter(well == .well) %>%
ggplot(aes(metal, level, fill = smpl_mth))+
scale_fill_manual(values = c('plum', 'lightsteelblue', 'darkolivegreen', 'indianred'))+
geom_col(position = "dodge")+
labs(title = paste0('Metals in Well ', toupper(.well)),
x = 'Metals',
y = 'µg/l')
}
lapply(unique(longer_raindata$well), plot_fun)
#> [[1]]
#>
#> [[2]]
I am looking to create unique/individual reports for a given list of vendors. The ideal output format would be a seperate html file of the given vendors information.
The issue is that I am having trouble wrapping my head around creating parameterized reports in RMarkdown. I have been taking a look at this link to understand how to loop/iterate through RMarkdown reports
To illustrate and share the logic of what I want to execute is the following:
for (vendor in vendor.name) {
rmarkdown::render('input.Rmd', params = list(vendor = vendor))
}
Where I then have a print out of:
Vendor-1.html, Vendor-2.html, …, Vendor-4.html, and vendor-4.html
Which then gets saved locally to my computer. The part I have wrapping my head around is say we have a bar graph of sales by month, how would the parameter passing through the entire document know when to change the the vendor number for a unique view.
If anyone can share an example either iris, mtcars, or any base dataset within R I would really appreciate it. Looking at how this workflow/logic would work because I am struggling to understand the concept.
To specify, say I have this chunck of code here. How would the params$vendor function know to loop over another vendor if I am not calling it within the chunk. Within my dplyr verbs on my filter should I do , MVNDR_NBR == qc_sales$vendor_number or params$vendor? This is the piece that has me most confounded
sales_2021_stock <- qc_sales %>%
filter(FSCL_YR == 2021
, STR_NBR != '8119'
, MAPPED_ORD_SRC == 'QC'
, so_flg == 0
, YTLW_TY_LY_FLG == 'TY'
, MVNDR_NBR == '60031167'
, !SUB_DEPT_NBR %in% c('0025','0028')) %>%
group_by(MVNDR_NBR, MVNDR_NM, FSCL_WK_NBR, FSCL_YR) %>%
summarise(Sales = sum(ESVS_NET_SLS)) %>%
mutate(FSCL_YR = as.character(FSCL_YR)) %>%
collect()
sales_2020_stock <- qc_sales %>%
filter(FSCL_YR == 2020
, STR_NBR != '8119'
, MAPPED_ORD_SRC == 'QC'
, YTLW_TY_LY_FLG == 'LY'
, so_flg == 0
, MVNDR_NBR == '60031167'
, !SUB_DEPT_NBR %in% c('0025','0028')) %>%
group_by(MVNDR_NBR, MVNDR_NM, FSCL_WK_NBR, FSCL_YR) %>%
summarise(Sales = sum(ESVS_NET_SLS)) %>%
mutate(FSCL_YR = as.character(FSCL_YR)) %>%
collect()
sales_comp_line_stock <- rbind(sales_2021_stock, sales_2020_stock)
stock_comp <- ggplot(sales_comp_line_stock, aes(x = FSCL_WK_NBR, y = Sales, color = FSCL_YR ))+
geom_line(size = 1.25, aes(color = FSCL_YR))+
geom_smooth(size = .50, aes(color = FSCL_YR), se = FALSE, method = "auto")+
scale_x_continuous(breaks=seq(0,weeks,1))+
scale_y_continuous(labels = scales::dollar_format(scale = .0001, suffix = "K"))+
scale_color_manual(values=c("#0298F9", "#F96302"))+
theme_economist()+
ggtitle('Week-Over-Week Sales (Stock) 2021 v 2020')+
theme(
panel.grid.major = element_line(linetype = "dotted"),
axis.text = element_text( size = 10),
legend.position = c(0, 1),legend.justification = c(0, 1),
plot.title = element_text( size = 14, margin=margin(0,0,20,0), hjust = 0.5),
panel.background = element_rect(fill = NA),
strip.text = element_text(size=10)
)
stock_comp
My dataset looks as follows, these are the same vendors included in my parameter, how would I create a ggplot displaying sales month over month printed into an individual html output?:
Ideally one plot would be written like this:
ggplot(sample_vendor_tbl, aes(x = FSCL_MTH_NM, y = Sales)) +
geom_col()
To print out multiple ggplots within a document I would do the following:
for (i in vendor_nbr){
ggplot(mydata, aes(x = Month, y = sales))+
geom_col()
}
I am just confused here when we need to accounting for the parameter. How can I create a plot for the given vendor for its print out, similar to the example posted in your answer. I basically want to do exactly what you did in your answer with the charts but leveraging ggplot instead of base R
To create a ggplot using parameters I had to pull in params$mvndr_nbr into my dplyr verbs as follows:
sample_vendor_tbl %>%
filter(MVNDR_NBR == params$MVNDR_NBR) %>%
ggplot(aes(x = FSCL_MTH_NM, y = Sales)) +
geom_col()
I think the step you were missing was specifying the output filename so that each "vendor" would have its own file; otherwise, the same filename is overwritten each time leaving you with a single HTML document.
An example:
---
title: mtcars cyl
author: r2evans
params:
cyl: null
---
We have chosen to plot a histogram of `mtcars` where `$cyl` is equal to `r params$cyl`:
```{r}
dat <- subset(mtcars, cyl == params$cyl)
if (nrow(dat) > 0) {
hist(dat$disp)
}
```
Calling this with:
for (cy in c(4,6,8)) {
rmarkdown::render("~/StackOverflow/10466439/67525642.Rmd",
output_file = sprintf("cyl_%s.html", cy),
params = list(cyl = cy))
}
will render three HTML files, cyl_4.html, cyl_6.html, and cyl_8.html, each with differing content:
I have this function that works close to what I need -- it creates a clean table from my original raw data, makes it a ggplot, and uses lapply to run it through all the variables I want from the original table, data:
#Get colnames of all numeric varaibles
nlist <- names(data[,sapply(data,is.numeric)])
#Create function
varviz_n <- function(dat, var){
var <- dat[,which(names(dat) == var)]
title<-var
tab <- dat %>%
group_by(group = cut(var, breaks = seq(0, max(var), 10)),
groupedsupport) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n)) %>%
filter(!is.na(group),n>10)
tab2 <- tab %>%
group_by(groupedsupport) %>%
summarise(mean = mean(freq),
median = median(freq))
finaltab <- tab %>% left_join(tab2, by = "groupedsupport")
fplot <- finaltab %>%
ggplot(aes(fill=group,x=groupedsupport,y=freq)) +
geom_col(position="dodge") +
geom_text(aes(label = paste("n =",n), n = (n + 0.05)), position = position_dodge(0.9), vjust = 0, size=2) +
geom_errorbar(aes(groupedsupport, ymax = median, ymin = mean),
size=0.5, linetype = "longdash", inherit.aes = F, width = 1) +
scale_y_continuous(labels = scales::percent) +
xlab("") + ylab("") +
ggtitle(title) +
scale_fill_discrete("")
filename = filename <- paste0(finaltab$var)
ggsave(paste("Plots/",filename,".png"), width = 10, height = 7)
return(fplot)
}
#Run function
lapply(nlist, varviz_n, dat = data)
This does almost exactly what I want -- the problem is that all of the variables it's running through are 0-100 numeric and it's creating the plots but I can't at all figure out how to get the column name as the title of the plot or of the key. So I have no idea which graph is getting returned.
Can someone please help me figure out a way to get the column name from nlist to be the title of my plot? The way it is now prints out the first value of the column instead of the actual column name:
The final piece of code to save it in the 'Plots' folder doesn't work either since the title/var isn't populating correctly.
You can use something like this to create data to test out the code: data <- data.frame(v1 = sample(1:100,1000,replace=T),v2 = sample(1:100,1000,replace=T),v3 = sample(1:100,1000,replace=T),groupedsupport = sample(LETTERS[1:3],1000,replace = TRUE))
Thanks!
I think you just need to swap these steps:
var <- dat[,which(names(dat) == var)]
title <- var
should be
title <- var
var <- dat[,which(names(dat) == var)]
var being assigned to the column of selected data so when it is called again in title, it is looking at that vector and not the column name.
If this doesn't resole it, please give us some code to mimic the contents of data.
I've got a question regarding an edge case with ggplot2 in R.
They don't like you adding multiple legends, but I think this is a valid use case.
I've got a large economic dataset with the following variables.
year = year of observation
input_type = *labor* or *supply chain*
input_desc = specific type of labor (eg. plumbers OR building supplies respectively)
value = percentage of industry spending
And I'm building an area chart over approximately 15 years. There are 39 different input descriptions and so I'd like the user to see the two major components (internal employee spending OR outsourcing/supply spending)in two major color brackets (say green and blue), but ggplot won't let me group my colors in that way.
Here are a few things I tried.
Junk code to reproduce
spec_trend_pie<- data.frame("year"=c(2006,2006,2006,2006,2007,2007,2007,2007,2008,2008,2008,2008),
"input_type" = c("labor", "labor", "supply", "supply", "labor", "labor","supply","supply","labor","labor","supply","supply"),
"input_desc" = c("plumber" ,"manager", "pipe", "truck", "plumber" ,"manager", "pipe", "truck", "plumber" ,"manager", "pipe", "truck"),
"value" = c(1,2,3,4,4,3,2,1,1,2,3,4))
spec_broad <- ggplot(data = spec_trend_pie, aes(y = value, x = year, group = input_type, fill = input_desc)) + geom_area()
Which gave me
Error in f(...) : Aesthetics can not vary with a ribbon
And then I tried this
sff4 <- ggplot() +
geom_area(data=subset(spec_trend_pie, input_type="labor"), aes(y=value, x=variable, group=input_type, fill= input_desc)) +
geom_area(data=subset(spec_trend_pie, input_type="supply_chain"), aes(y=value, x=variable, group=input_type, fill= input_desc))
Which gave me this image...so closer...but not quite there.
To give you an idea of what is desired, here's an example of something I was able to do in GoogleSheets a long time ago.
It's a bit of a hack but forcats might help you out. I did a similar post earlier this week:
How to factor sub group by category?
First some base data
set.seed(123)
raw_data <-
tibble(
x = rep(1:20, each = 6),
rand = sample(1:120, 120) * (x/20),
group = rep(letters[1:6], times = 20),
cat = ifelse(group %in% letters[1:3], "group 1", "group 2")
) %>%
group_by(group) %>%
mutate(y = cumsum(rand)) %>%
ungroup()
Now, use factor levels to create gradients within colors
df <-
raw_data %>%
# create factors for group and category
mutate(
group = fct_reorder(group, y, max),
cat = fct_reorder(cat, y, max) # ordering in the stack
) %>%
arrange(cat, group) %>%
mutate(
group = fct_inorder(group), # takes the category into account first
group_fct = as.integer(group), # factor as integer
hue = as.integer(cat)*(360/n_distinct(cat)), # base hue values
light_base = 1-(group_fct)/(n_distinct(group)+2), # trust me
light = floor(light_base * 100) # new L value for hcl()
) %>%
mutate(hex = hcl(h = hue, l = light))
Create a lookup table for scale_fill_manual()
area_colors <-
df %>%
distinct(group, hex)
Lastly, make your plot
ggplot(df, aes(x, y, fill = group)) +
geom_area(position = "stack") +
scale_fill_manual(
values = area_colors$hex,
labels = area_colors$group
)