I'm trying to create a plot showing the temporal evolution (x) of different values from the same column (y), thus requiring to create a plot with different lines.
I am able to create separate plots for each value of y, so my problem seems to be specifically about adding multiple lines showing different values (of different lengths it seems).
This is the dput of the columns "Date" and "Journal" I use from my dataset "test" :
> structure(list(Date = structure(c(9132, 9136, 9136, 9141, 9141,
9142), class = "Date", tzone = "Europe/Paris"), Journal = c("Libération",
"Libération", "Libération", "Libération", "Le Monde", "La Tribune (France)"
)), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x000002146c471ef0>, class = c("data.table",
"data.frame"))
I used the following code to successfully create a barplot which shows the evolution of column "Journal" according to the column "Date".
dateplot <- ggplot(cleantest) + aes(x = format(Date, "%Y%")) + geom_bar()
I also managed to create single line plots for each value of Y, with the following code :
valueplot <- ggplot(subset(test, Journal %in% "value")) + aes(x = format(Date, "%Y")) + geom_line(stat = "count", na.rm = TRUE, group = 1)
Therefore, I typed the following codes to obtain, for example, two lines in the same plot, and each of them returned a different error :
jourplot <- ggplot(test, aes(x = format(Date, "%Y"))) + geom_line(aes(y = subset(test, Journal %in% "Libération"), colour = "blue")) + geom_line(aes(y = subset(test, Journal %in% "Le Figaro"), colour =
"red"))
The error is :
> Don't know how to automatically pick scale for object of type data.table/data.frame. Defaulting to continuous.
Erreur : Aesthetics must be either length 1 or the same as the data (17307): y
So I tried this :
jourplot <- ggplot(test, aes(x = format(Date, "%Y")) + geom_line(aes(y = subset(test, Journal %in% "Libération"), colour = "blue"), stat = "count", na.rm = TRUE, group = 1) + geom_line(aes(y = subset(test, Journal %in% "Le Figaro"), colour = "red"), stat = "count", na.rm = TRUE, group = 1"))
But this one doesn't even create the "jourplot" object.
There is obviously something wrong with my code and / or my data, but as a newbie I really don't see it. It seems to be about length, but how do I get over this ? Or is this about the classes of the columns that make things difficult to process for ggplot ?
Does anyone understand what is going on ?
Edit : I deleted the "+" symbols from prompt
Is it your full dataset ? To my opinion your example seems to be too small to get a sense of what you are trying to plot.
From my understanding, you are trying to plot the count of each journal per year. But your example is covering only few points for 1995 with some journal with an unique value, so I don't think you can get a line with one point.
Here, I simulate a dataframe with a sequence of dates covering every week for five years and I attributes randomly for each week, one of three journals. Then, I formated the date sequence per year and I plot the count for each year as follow:
library(lubridate)
rep_df <- data.frame(Date = seq(ymd("1995-01-01"),ymd("2000-01-01"), by = "weeks"),
Journal = sample(c("Liberation","Le Monde","Le Figaro"), 261, replace = TRUE))
rep_df$Year <- floor_date(rep_df$Date, unit = "year")
head(rep_df)
Date Journal Year
1 1995-01-01 Le Monde 1995-01-01
2 1995-01-08 Le Figaro 1995-01-01
3 1995-01-15 Liberation 1995-01-01
4 1995-01-22 Le Monde 1995-01-01
5 1995-01-29 Liberation 1995-01-01
6 1995-02-05 Liberation 1995-01-01
library(ggplot2)
ggplot(rep_df, aes(x = Year))+
geom_point(aes(color = Journal), stat = "count")+
geom_line(aes(color = Journal),stat = "count")+
scale_x_date(date_labels = "%Y", date_breaks = "1 year", name = "")
Does it look what you are trying to get ?
Related
I have a very large dataset derived from a spreadsheet of the format below:
df = data.frame(name = c('Ger1', 'Ger2', 'Ger3', 'Ger4', 'Ger5', 'Ger6'),
issued = c(UKS, USD, UKS, UKS, USD, USD),
mat = c(2024-01-31, 2023-01-31, 2026-10-22, 2022-07-22, 2029-01-31, 2025-06-07)
volume = c(0.476, 0.922, 0.580, 1.259, 0.932, 0.417)
Currently, I plot all the data on one very long ggplot with the following code:
chart1<-ggplot(df)+geom_bar(stat="ID",aes(x=volume,y=name),fill="#1170aa")+
theme(title=element_text(size=12),panel.background = element_rect(fill='white',color='black'),legend.position='right')+
labs(title = "Total carriage by Volume on the day", x = "Volume", y = "Name")
Now while that worked for a while, given the size the dataset has grown to it is no longer feasible to use that way. Therefore I'd like to plot the data based on the contents of the "issued" column.
I first thought about a condition statement of the type:
if (df$issued == "UKS"){
chart1<-ggplot(df)+geom_bar(stat="ID",aes(x=volume,y=name),fill="#1170aa")+
theme(title=element_text(size=12),panel.background = element_rect(fill='white',color='black'),legend.position='right')+
labs(title = "Total carriage by Volume on the day", x = "Volume", y = "Name")
}
It unfortunately didn't work (although on closer inspection my logic wasn't particularly well thought-out)
I have then tried using the subset() function in the hopes that would allow to only plot data meeting my requirements as so:
chart1<-ggplot(subset(df, 'issued' == "UKS"))+geom_bar(stat="ID",aes(x=volume,y=name),fill="#1170aa")+
theme(title=element_text(size=12),panel.background = element_rect(fill='white',color='black'),legend.position='right')+
labs(title = "Total carriage by Volume on the day", x = "Volume", y = "Name")
This particular code didn't show up any errors, but the chart that was produced had no data on it all. Does anyone have any ideas on how I can filter and plot this data?
You don't need quote "" for column names in subset().
ggplot(subset(df, issued == "UKS")) +
geom_bar(stat="identity", aes(x=volume,y=name),fill="#1170aa")+
theme(title=element_text(size=12),
panel.background = element_rect(fill='white',color='black'),
legend.position='right')+
labs(title = "Total carriage by Volume on the day", x = "Volume", y = "Name")
Or use a tidyverse way of plotting:
library(tidyverse)
df %>% filter(issued == "UKS") %>%
ggplot() +
geom_bar(stat="identity", aes(x=volume,y=name),fill="#1170aa")+
theme(title=element_text(size=12),
panel.background = element_rect(fill='white',color='black'),
legend.position='right')+
labs(title = "Total carriage by Volume on the day", x = "Volume", y = "Name")
I am from a province which have 3 different areas.
I have a dataframe with all the days and the deaths from covid19 in whole the province. My idea its plot the data by weeks or month. The sum of these 7 days or 30 days. But I want to make the difference with 3 colours to depict the 3 different areas.
So this is my code. I can plot the total column. The 3 areas are called: alicante, valencia , castellon.
I don´t know how to do it!
library(ggplot2)
library(scales)
log <- read.csv('https://dadesobertes.gva.es/es/datastore/dump/69c32771-3d18-4654-8c3c-cb423fcfa652?bom=True',stringsAsFactors = F,encoding = 'UTF-8')
colnames(log) <- c("code", "Date", "total", "hombres", "mujeres", "alicante", "castellon", "valencia", "dvinaros", "dcastellon", "dlaplana", "dsangunto", "dmalvarrosa", "dvilanova", "dlafe", "drequena", "dvalenciageneral", "dpeset", "dlaribera", "dgandia", "ddenia", "dxativa", "dalcoy", "dlamarina", "dsanjuan", "delda", "dalicantegeneral", "delchegeneral", "dorihuela", "dtorrevieja", "dmanises", "delchecrevillente" )
#log$Date <- as.Date(log$Date,
log$Date <- as.Date(log$Date,
"%Y-%m-%dT%H:%M:%S") # tabulate all the options here
# create variables of the week and month of each observation:
log$Mes <- as.Date(cut(log$Date,
breaks = "month"))
log$Week <- as.Date(cut(log$Date,
breaks = "week",
start.on.monday = FALSE)) # changes weekly break point to Sunday
# graph by month:
ggplot(data = log,
aes(Week, total, fill="Defunciones semanales")) +
stat_summary(fun.y = sum, # adds up all observations for the month
geom = "bar") +
labs(fill = "Color", y = "") +
#geom_text(aes(y = total,label = total), vjust=0, hjust= 0,size=4) +
labs(title = "Defunciones semanales en la Comunidad Valenciana hasta el 17 de Enero",
subtitle = "Fuente:dadesobertes.gva.es/es/dataset/covid-19-series-personas-fallecidas. ") +
scale_x_date(
#labels = date_format( "%B"),
labels = date_format( "%d-%m"),
limits=c(as.Date("2020-03-01"), as.Date("2021-02-01")),
breaks = "1 week") + # custom x-axis labels
theme(axis.text.x=element_text(angle=60, hjust=1))
This more an issue in data wrangling than in plotting. To achieve your desired result reshape your data to long format using e.g. tidy::pivot_longer. Additionally set position to "stack" in stat_summary to stack the bars for the areas.
library(ggplot2)
library(scales)
library(tidyr)
library(dplyr)
log <- read.csv("https://dadesobertes.gva.es/es/datastore/dump/69c32771-3d18-4654-8c3c-cb423fcfa652?bom=True", stringsAsFactors = F, encoding = "UTF-8")
colnames(log) <- c("code", "Date", "total", "hombres", "mujeres", "alicante", "castellon", "valencia", "dvinaros", "dcastellon", "dlaplana", "dsangunto", "dmalvarrosa", "dvilanova", "dlafe", "drequena", "dvalenciageneral", "dpeset", "dlaribera", "dgandia", "ddenia", "dxativa", "dalcoy", "dlamarina", "dsanjuan", "delda", "dalicantegeneral", "delchegeneral", "dorihuela", "dtorrevieja", "dmanises", "delchecrevillente")
log$Date <- as.Date(
log$Date,
"%Y-%m-%dT%H:%M:%S"
) # tabulate all the options here
log$Mes <- as.Date(cut(log$Date,
breaks = "month"
))
log$Week <- as.Date(cut(log$Date,
breaks = "week",
start.on.monday = FALSE
)) # changes weekly break point to Sunday
# select desired or needed variables and reshape to long format
log_area <- select(log, 1:2, 6:8, Mes, Week) %>%
pivot_longer(-c(code, Date, Mes, Week), names_to = "area")
# graph by month:
ggplot(
data = log_area,
aes(Week, value, fill = area)
) +
stat_summary(
fun.y = sum, # adds up all observations for the month
geom = "bar",
position = "stack"
) +
labs(
fill = "Color", y = "",
title = "Defunciones semanales en la Comunidad Valenciana hasta el 17 de Enero",
subtitle = "Fuente:dadesobertes.gva.es/es/dataset/covid-19-series-personas-fallecidas. "
) +
scale_x_date(
labels = date_format("%d-%m"),
limits = c(as.Date("2020-03-01"), as.Date("2021-02-01")),
breaks = "1 week"
) + # custom x-axis labels
theme(axis.text.x = element_text(angle = 60, hjust = 1))
#> Warning: `fun.y` is deprecated. Use `fun` instead.
#> Warning: Removed 87 rows containing non-finite values (stat_summary).
#> Warning: Removed 3 rows containing missing values (geom_bar).
Created on 2021-01-30 by the reprex package (v1.0.0)
I'm trying to generate a stacked line/area graph utilizing the ggplot and geom_area functions. I have my data loaded into R correctly from what I can tell. Every time I generate the plot, the graph is empty (even though the axis looks correct except for the months being organized in alpha).
I've tried utilizing the data.frame function to define my variables but was unable to generate my plot. I've also looked around Stack Overflow and other websites, but no one seems to have the issue of no errors but still an empty plot.
Here's my data set:
Here's the code I'm using currently:
ggplot(OHV, aes(x=Month)) +
geom_area(aes(y=A+B+Unknown, fill="A")) +
geom_area(aes(y=B, fill="B")) +
geom_area(aes(y=Unknown, fill="Unknown"))
Here's the output at the end:
I have zero error messages, simply just no data being plotted on my graph.
Your dates are being interpreted as a factor. You must transform them.
ibrary(tidyverse)
set.seed(1)
df <- data.frame(Month = seq(lubridate::ymd('2018-01-01'),
lubridate::ymd('2018-12-01'), by = '1 month'),
Unknow = sample(17, replace = T, size = 12),
V1 = floor(runif(12, min = 35, max = 127)),
V2 = floor(runif(12, min = 75, max = 275)))
df <- df %>%
dplyr::mutate(Month = format(Month, '%b')) %>%
tidyr::gather(key = "Variable", value = "Value", -Month)
ggplot2::ggplot(df) +
geom_area(aes(x = Month, y = Value, fill = Variable),
position = 'stack')
Note that I used tidyr::gather to be able to stack the areas in an easier way.
Now assuming your year of analysis is 2018, you need to transform the date of your data frame to something continuous, in the interpretation of r.
df2 <- df %>%
dplyr::mutate(Month = paste0("2018-", Month, "-01"),
Month = lubridate::parse_date_time(Month,"y-b-d"),
Month = as.Date(Month))
library(scales)
ggplot2::ggplot(df2) +
geom_area(aes(x = Month, y = Value, fill = Variable),
position = 'stack') +
scale_x_date(labels = scales::date_format("%b"))
I've got a question regarding an edge case with ggplot2 in R.
They don't like you adding multiple legends, but I think this is a valid use case.
I've got a large economic dataset with the following variables.
year = year of observation
input_type = *labor* or *supply chain*
input_desc = specific type of labor (eg. plumbers OR building supplies respectively)
value = percentage of industry spending
And I'm building an area chart over approximately 15 years. There are 39 different input descriptions and so I'd like the user to see the two major components (internal employee spending OR outsourcing/supply spending)in two major color brackets (say green and blue), but ggplot won't let me group my colors in that way.
Here are a few things I tried.
Junk code to reproduce
spec_trend_pie<- data.frame("year"=c(2006,2006,2006,2006,2007,2007,2007,2007,2008,2008,2008,2008),
"input_type" = c("labor", "labor", "supply", "supply", "labor", "labor","supply","supply","labor","labor","supply","supply"),
"input_desc" = c("plumber" ,"manager", "pipe", "truck", "plumber" ,"manager", "pipe", "truck", "plumber" ,"manager", "pipe", "truck"),
"value" = c(1,2,3,4,4,3,2,1,1,2,3,4))
spec_broad <- ggplot(data = spec_trend_pie, aes(y = value, x = year, group = input_type, fill = input_desc)) + geom_area()
Which gave me
Error in f(...) : Aesthetics can not vary with a ribbon
And then I tried this
sff4 <- ggplot() +
geom_area(data=subset(spec_trend_pie, input_type="labor"), aes(y=value, x=variable, group=input_type, fill= input_desc)) +
geom_area(data=subset(spec_trend_pie, input_type="supply_chain"), aes(y=value, x=variable, group=input_type, fill= input_desc))
Which gave me this image...so closer...but not quite there.
To give you an idea of what is desired, here's an example of something I was able to do in GoogleSheets a long time ago.
It's a bit of a hack but forcats might help you out. I did a similar post earlier this week:
How to factor sub group by category?
First some base data
set.seed(123)
raw_data <-
tibble(
x = rep(1:20, each = 6),
rand = sample(1:120, 120) * (x/20),
group = rep(letters[1:6], times = 20),
cat = ifelse(group %in% letters[1:3], "group 1", "group 2")
) %>%
group_by(group) %>%
mutate(y = cumsum(rand)) %>%
ungroup()
Now, use factor levels to create gradients within colors
df <-
raw_data %>%
# create factors for group and category
mutate(
group = fct_reorder(group, y, max),
cat = fct_reorder(cat, y, max) # ordering in the stack
) %>%
arrange(cat, group) %>%
mutate(
group = fct_inorder(group), # takes the category into account first
group_fct = as.integer(group), # factor as integer
hue = as.integer(cat)*(360/n_distinct(cat)), # base hue values
light_base = 1-(group_fct)/(n_distinct(group)+2), # trust me
light = floor(light_base * 100) # new L value for hcl()
) %>%
mutate(hex = hcl(h = hue, l = light))
Create a lookup table for scale_fill_manual()
area_colors <-
df %>%
distinct(group, hex)
Lastly, make your plot
ggplot(df, aes(x, y, fill = group)) +
geom_area(position = "stack") +
scale_fill_manual(
values = area_colors$hex,
labels = area_colors$group
)
I'm wondering if there is any easy way to change the name in a legend (given using the colour aesthetic) on a ggplot after the plot is created. I know this feels a bit hacky and would normally be changed in the data or when the plot is created, but I want to change the label on a plot that is created by another package, and there's no option in the package to change it.
I could obviously copy the function and save my own version and change it, but I just want to change one thing so it seems neater if I can just do it afterwards.
Here is an example with some dummy data, basically I want to relabel the Mean and Median timeseries that come out of fasstr's plot_daily_stats to "Modelled Mean" and "Modelled Median" so they cannot be confused with the observed mean which I am manually adding.
library(fasstr)
library(tibble)
library(ggplot2)
#create some fake data
df <- tibble(Date = seq.Date(from = as.Date("1991-01-01"), as.Date("1997-12-31"),
by = "day"),
DayOfYear = as.numeric(format(Date, "%j")),
Value = runif(2557,0,1) + 50 + (cos((1/60)*DayOfYear)+4))
obsdf <- tibble(Date = seq.Date(from = as.Date("1900-01-01"), as.Date("1900-12-31"),
by = "day"),
DayOfYear = as.numeric(format(Date, "%j")),
Value = runif(365,0,1) + 51 + (cos((1/60)*DayOfYear)+4))
# create plot using fasstr package
plt1<- fasstr::plot_daily_stats(df)
# add my own trace. I also want to rename the trace "Mean" to
# "Modelled Mean" to avoid confusion (and same with Median)
plt1$Daily_Statistics +
geom_line(data = obsdf, aes(x = Date, y = Value, colour = "Observed Mean"))+
scale_colour_manual(values = c("red", "black","blue"))
The names are given in fasstr as hard coded names:
daily_plots <- ... +
ggplot2::geom_line(ggplot2::aes(y = Median, colour = "Median")) +
ggplot2::geom_line(ggplot2::aes(y = Mean, colour = "Mean"))
No hacking needed, just add labels to your manual scale.
plt1$Daily_Statistics +
geom_line(data = obsdf, aes(x = Date, y = Value, colour = "Observed Mean"))+
scale_colour_manual(labels = c("Modelled Mean","Modelled Median","Observed Mean"),
values = c("red", "black","blue"))