I'm trying to generate a stacked line/area graph utilizing the ggplot and geom_area functions. I have my data loaded into R correctly from what I can tell. Every time I generate the plot, the graph is empty (even though the axis looks correct except for the months being organized in alpha).
I've tried utilizing the data.frame function to define my variables but was unable to generate my plot. I've also looked around Stack Overflow and other websites, but no one seems to have the issue of no errors but still an empty plot.
Here's my data set:
Here's the code I'm using currently:
ggplot(OHV, aes(x=Month)) +
geom_area(aes(y=A+B+Unknown, fill="A")) +
geom_area(aes(y=B, fill="B")) +
geom_area(aes(y=Unknown, fill="Unknown"))
Here's the output at the end:
I have zero error messages, simply just no data being plotted on my graph.
Your dates are being interpreted as a factor. You must transform them.
ibrary(tidyverse)
set.seed(1)
df <- data.frame(Month = seq(lubridate::ymd('2018-01-01'),
lubridate::ymd('2018-12-01'), by = '1 month'),
Unknow = sample(17, replace = T, size = 12),
V1 = floor(runif(12, min = 35, max = 127)),
V2 = floor(runif(12, min = 75, max = 275)))
df <- df %>%
dplyr::mutate(Month = format(Month, '%b')) %>%
tidyr::gather(key = "Variable", value = "Value", -Month)
ggplot2::ggplot(df) +
geom_area(aes(x = Month, y = Value, fill = Variable),
position = 'stack')
Note that I used tidyr::gather to be able to stack the areas in an easier way.
Now assuming your year of analysis is 2018, you need to transform the date of your data frame to something continuous, in the interpretation of r.
df2 <- df %>%
dplyr::mutate(Month = paste0("2018-", Month, "-01"),
Month = lubridate::parse_date_time(Month,"y-b-d"),
Month = as.Date(Month))
library(scales)
ggplot2::ggplot(df2) +
geom_area(aes(x = Month, y = Value, fill = Variable),
position = 'stack') +
scale_x_date(labels = scales::date_format("%b"))
Related
Context, question and reproducible example:
Context: I am trying to compare time intervals on a "one-year" scale to visualize the days covered by each interval regardless of the years.
Problem: when geom_segment() is used, it draws a straight line between the start and the end, even when the "start" is after the "end" on the plot. When an interval has days in two different years, it highlights the wrong ones, i.e. the one not covered by the interval (see the 1st plot below).
Question: is it possible to obtain the second plot without manually creating the df_exp object? Any tips and tricks are welcome!
library(lubridate)
library(ggplot2)
# dummy dataset
df <- data.frame(
time_int = c(interval("2010-03-01", "2010-06-15"), interval("2015-10-23", "2016-02-20")),
obs = c("A", "B")
)
# current result
ggplot(data = df) +
geom_segment(aes(x = `year<-`(int_start(time_int), 0000),
xend = `year<-`(int_end(time_int), 0000),
y = obs, yend = obs)) +
labs(x = "Current segment output")
# expected output
df_exp <- data.frame(
time_int = c(interval("0000-03-01", "0000-06-15"),
interval("0000-10-23", "0000-12-31"),
interval("0000-01-01", "0000-02-20")),
obs = c("A", "B", "B")
)
ggplot(data = df_exp) +
geom_segment(aes(x = int_start(time_int),
xend = int_end(time_int),
y = obs, yend = obs)) +
labs(x = "Expected segment output")
Possible start:
Even if the output is very close to the expected one, I feel like creating these vectors of explicit days is not very efficient, maybe the is a smarter way.
# Possible solution: make days explicit
library(dplyr)
library(tibble)
purrr::map(.x = split(df, ~obs), .f = ~ seq(int_start(.x$time_int), int_end(.x$time_int), "1 day")) %>%
enframe() %>%
tidyr::unnest(value) %>%
ggplot(data = .) +
geom_point(aes(x = `year<-`(value, 0000), y = name))
Created on 2022-10-18 with reprex v2.0.2
I would like to make a heatmap with ggplot.
The results should be something like this (the y-axis needs to be reversed though):
A subset of example data is below. For the actual application the dataframe has 1000+ users instead of only 3. The gradient filling should be based on the value of the users.
Date <- seq(
from = as.POSIXct("2016-01-01 00:00"),
to = as.POSIXct("2016-12-31 23:00"),
by = "hour"
)
user1 <- runif(length(Date), min = 0, max = 10)
user2 <- runif(length(Date), min = 0, max = 10)
user3 <- runif(length(Date), min = 0, max = 10)
example <- data.frame(Date, user1, user2, user3)
example$hour <- format(example$Date, format = "%H:%M")
example$total <- rowSums(example[,c(2:4)])
I have tried several things by using the (fill = total) argument in combination with geom_tile, geom_raster and stat_density2d (like suggested in similar posts here). An example below:
ggplot(plotHuishoudens, aes(Date, hour, fill = Total)) +
geom_tile() +
scale_fill_gradient(low = "blue", high = "red")
Which only shows individual points and not shows the y axis like a continuous variable (scale_y_continuous also did not help with this), although the variable is a continuous one?
How can I create a heatmap like the example provided above?
And how could I make a nice cut-off on the y axis (e.g. per 3 hours instead of per hour)?
The way your data is defined, you won't come to the desired output because example$Date is a POSIXct object, that is a date and an hour.
So, you must map your graph to the day only:
ggplot(data = example) +
geom_raster(aes(x=as.Date(Date, format='%d%b%y'), y=hour, fill=total)) +
scale_fill_gradient(low = "blue", high = "red")
For your second question, you can group hours like this:
example <- example %>%
group_by(grp = rep(row_number(), length.out = n(), each = 4)) %>%
summarise(Date = as.Date(sample(Date, 1), format='%d%b%y'),
total = sum(total),
time_slot = paste(min(hour), max(hour), sep = "-"))
ggplot(data = example) +
geom_raster(aes(x = Date, y = time_slot, fill = total)) +
scale_fill_gradientn(colours = (cet_pal(6, name = "inferno"))) # I like gradients from "cetcolor" package
Please help!
I have case data I need to prepare for a report soon and just cannot get the graphs to display properly.
From a dataset with CollectionDate as the "record" of cases (i.e. multiple rows with the same date means more cases that day), I want to display Number of positive cases/total (positive + negative) cases for that day as a percent on the y-axis, with collection dates along the x-axis. Then I want to break down by region. Goal is to look like this but in terms of daily positives/# of tests rather than just positives vs negatives. I also want to add a horizontal line on every graph at 20%.
I have tried manipulating it before, in and after ggplot:
ggplot(df_final, aes(x =CollectionDate, fill = TestResult)) +
geom_bar(aes(y=..prop..)) +
scale_y_continuous(labels=percent_format())
Which is, again, close. But the percents are wrong because they are just taking the proportion of that day against counts of all days instead of per day.
Then I tried using tally()in the following command to try and count per region and aggregate:
df_final %>%
group_by(CollectionDate, Region, as.factor(TestResult)) %>%
filter(TestResult == "Positive") %>%
tally()
and I still cannot get the graphs right.
Suggestions?
A quick look at my data:
head(df_final)
Well, I have to say that I am not 100% sure that I got what you want, but anyway, this can be helpful.
The data: Since you are new here, I have to let you know that using a simple and reproducible version of your data will make it easier to the rest of us to answer. To do this you can simulate a data frame o any other objec, or use dput function on it.
library(ggplot2)
library(dplyr)
data <- data.frame(
# date
CollectionDate = sample(
seq(as.Date("2020-01-01"), by = "day", length.out = 15),
size = 120, replace = TRUE),
# result
TestResult = sample(c("Positive", "Negative"), size = 120, replace = TRUE),
# region
Region = sample(c("Region 1", "Region2"), size = 120, replace = TRUE)
)
With this data, you can do ass follow to get the plots you want.
# General plot, positive cases proportion
data %>%
count(CollectionDate, TestResult, name = "cases") %>%
group_by(CollectionDate) %>%
summarise(positive_pro = sum(cases[TestResult == "Positive"])/sum(cases)) %>%
ggplot(aes(x = CollectionDate, y = positive_pro)) +
geom_col() +
geom_hline(yintercept = 0.2)
# positive proportion by day within region
data %>%
count(CollectionDate, TestResult, Region, name = "cases") %>%
group_by(CollectionDate, Region) %>%
summarise(
positive_pro = sum(cases[TestResult == "Positive"])/sum(cases)
) %>%
ggplot(aes(x = CollectionDate, y = positive_pro)) +
geom_col() +
# horizontal line at 20%
geom_hline(yintercept = 0.2) +
facet_wrap(~Region)
I can get you halfway there (refer to the comments in the code for clarifications). This code is for the counts per day per region (plotted separately for each region). I think you can tweak things further to calculate the counts per day per county too; and whole state should be a cakewalk. I wish you good luck with your report.
rm(list = ls())
library(dplyr)
library(magrittr)
library(ggplot2)
library(scales)
library(tidyr) #Needed for the spread() function
#Dummy data
set.seed(1984)
sdate <- as.Date('2000-03-09')
edate <- as.Date('2000-05-18')
dateslist <- as.Date(sample(as.numeric(sdate): as.numeric(edate), 10000, replace = TRUE), origin = '1970-01-01')
df_final <- data.frame(Region = rep_len(1:9, 10000),
CollectionDate = dateslist,
TestResult = sample(c("Positive", "Negative"), 10000, replace = TRUE))
#First tally the positve and negative cases
#by Region, CollectionDate, TestResult in that order
df_final %<>%
group_by(Region, CollectionDate, TestResult) %>%
tally()
#Then
#First spread the counts (in n)
#That is, create separate columns for Negative and Positive cases
#for each Region-CollectionDate combination
#Then calculate their proportions (as shown)
#Now you have Negative and Positive
#percentages by CollectionDate by Region
df_final %<>%
spread(key = TestResult, value = n) %>%
mutate(Negative = Negative/(Negative + Positive),
Positive = Positive/(Negative + Positive))
#Plotting this now
#Since the percentages are available already
#Use geom_col() instead of geom_bar()
df_final %>% ggplot() +
geom_col(aes(x = CollectionDate, y = Positive, fill = "Positive"),
position = "identity", alpha = 0.4) +
geom_col(aes(x = CollectionDate, y = Negative, fill = "Negative"),
position = "identity", alpha = 0.4) +
facet_wrap(~ Region, nrow = 3, ncol = 3)
This yields:
I've got a question regarding an edge case with ggplot2 in R.
They don't like you adding multiple legends, but I think this is a valid use case.
I've got a large economic dataset with the following variables.
year = year of observation
input_type = *labor* or *supply chain*
input_desc = specific type of labor (eg. plumbers OR building supplies respectively)
value = percentage of industry spending
And I'm building an area chart over approximately 15 years. There are 39 different input descriptions and so I'd like the user to see the two major components (internal employee spending OR outsourcing/supply spending)in two major color brackets (say green and blue), but ggplot won't let me group my colors in that way.
Here are a few things I tried.
Junk code to reproduce
spec_trend_pie<- data.frame("year"=c(2006,2006,2006,2006,2007,2007,2007,2007,2008,2008,2008,2008),
"input_type" = c("labor", "labor", "supply", "supply", "labor", "labor","supply","supply","labor","labor","supply","supply"),
"input_desc" = c("plumber" ,"manager", "pipe", "truck", "plumber" ,"manager", "pipe", "truck", "plumber" ,"manager", "pipe", "truck"),
"value" = c(1,2,3,4,4,3,2,1,1,2,3,4))
spec_broad <- ggplot(data = spec_trend_pie, aes(y = value, x = year, group = input_type, fill = input_desc)) + geom_area()
Which gave me
Error in f(...) : Aesthetics can not vary with a ribbon
And then I tried this
sff4 <- ggplot() +
geom_area(data=subset(spec_trend_pie, input_type="labor"), aes(y=value, x=variable, group=input_type, fill= input_desc)) +
geom_area(data=subset(spec_trend_pie, input_type="supply_chain"), aes(y=value, x=variable, group=input_type, fill= input_desc))
Which gave me this image...so closer...but not quite there.
To give you an idea of what is desired, here's an example of something I was able to do in GoogleSheets a long time ago.
It's a bit of a hack but forcats might help you out. I did a similar post earlier this week:
How to factor sub group by category?
First some base data
set.seed(123)
raw_data <-
tibble(
x = rep(1:20, each = 6),
rand = sample(1:120, 120) * (x/20),
group = rep(letters[1:6], times = 20),
cat = ifelse(group %in% letters[1:3], "group 1", "group 2")
) %>%
group_by(group) %>%
mutate(y = cumsum(rand)) %>%
ungroup()
Now, use factor levels to create gradients within colors
df <-
raw_data %>%
# create factors for group and category
mutate(
group = fct_reorder(group, y, max),
cat = fct_reorder(cat, y, max) # ordering in the stack
) %>%
arrange(cat, group) %>%
mutate(
group = fct_inorder(group), # takes the category into account first
group_fct = as.integer(group), # factor as integer
hue = as.integer(cat)*(360/n_distinct(cat)), # base hue values
light_base = 1-(group_fct)/(n_distinct(group)+2), # trust me
light = floor(light_base * 100) # new L value for hcl()
) %>%
mutate(hex = hcl(h = hue, l = light))
Create a lookup table for scale_fill_manual()
area_colors <-
df %>%
distinct(group, hex)
Lastly, make your plot
ggplot(df, aes(x, y, fill = group)) +
geom_area(position = "stack") +
scale_fill_manual(
values = area_colors$hex,
labels = area_colors$group
)
I want to render range plot or scatter plot for temperature time series in R. Basically, for each region, I need to calculate first 10 year and last 10-year' temperature mean and precipitation total sum respectively; then going to make a range plot that reference year' gdp_percapita (let's say gdp_percapita in 1995) against first 10 year and last 10-year' temperature mean and precipitation total sum.
reproducible data:
Here is the reproducible data that simulated with actual temperature time series:
dat= data.frame(index = rep(c('dex111', 'dex112', 'dex113','dex114','dex115'), each = 30), year =1980:2009,
region= rep(c('Berlin','Stuttgart','Böblingen','Wartburgkreis','Eisenach'), each=30),
gdp_percapita=rep(sample.int(40, 30), 5),gva_agr_perworker=rep(sample.int(45, 30), 5),
temperature=rep(sample.int(50, 30), 5), precipitation=rep(sample.int(60, 30), 5))
update:
Here is what I did so far:
library(tidyverse)
func <- dat %>%
group_by(temperature, precipitation) %>%
summarize_all(funs(mean, sum))
seems I was wrong about to get first ten years and last ten years of mean temperature and total precipitation. Any correction.
func %>%
gather(year, region, temperature, precipitation, gdp_percapita) %>%
separate(col, into = c("Measurement", "stat")) %>%
arrange(region) %>%
mutate_at(vars(col, Measurement), fct_inorder) %>%
spread(col, val)
But above code is not well fit for making plot, don't know what went wrong in my code? Any idea?
I knowggplot2 is amazing to render expected range plot for this data, but my attempt to reshape the data for making plot is not correct. Any way to make this plot in R? How can I make this happen in ggplot2? Any idea?
update:
not that I choose gdp_percapita of 2000 for all regions in x-axis while periodic mean temperature difference and precipitation sum difference along the y-axis for all regions.
desired plot:
here is the desired range plot for temperature and precipitation:
How do I accomplish my desired output with minimal code and efficiently? Could someone point me in the right direction?
Here's a solution that I think does what you want. In general, you should try to keep your questions narrower, because just saying "I don't know what went wrong" makes the question difficult for others to use.
There's a few steps here. I want to get the data into the format of one row per region to plot with summarise, using this to get the arguments for the aesthetics we'll need (geom_point and geom_linerange). Then, to plot the two different groups, we'll gather them so that decade can become a group variable.
N.B. I edited the sample data so that it no longer had every group with the exact same data for a little bit of variety.
geom_text_repel is a nice function from the ggrepel package that makes labels easier to add. We want to filter to just one of the groups so that the labels don't appear twice.
library(tidyverse)
set.seed(2346)
dat <- data.frame(
index = rep(c("dex111", "dex112", "dex113", "dex114", "dex115"), each = 30),
year = 1980:2009,
region = rep(c("Berlin", "Stuttgart", "Böblingen", "Wartburgkreis", "Eisenach"), each = 30),
ln_gdp_percapita = sample.int(40, 150, replace = TRUE),
ln_gva_agr_perworker = sample.int(45, 150, replace = TRUE),
temperature = sample.int(50, 150, replace = TRUE),
recipitation = sample.int(60, 150, replace = TRUE)
)
stats <- dat %>%
group_by(region) %>%
summarise(
ln_gdp = mean(ln_gdp_percapita),
range_max = max(temperature),
range_min = min(temperature),
decade_80s = mean(temperature[which(year %in% 1980:1989)]),
decade_00s = mean(temperature[which(year %in% 2000:2009)])
) %>%
gather(decade, mean, decade_80s, decade_00s)
ggplot(stats, aes(x = ln_gdp)) +
geom_point(aes(y = mean, colour = decade)) +
geom_linerange(aes(ymin = range_min, ymax = range_max)) +
ggrepel::geom_text_repel(
data = . %>% filter(decade == "decade_00s"),
mapping = aes(y = mean, label = region)
)
Created on 2018-06-15 by the reprex package (v0.2.0).